Download - MIPS Registers

2008-2009

Informatics 3 - Computer Architecture 40

Additional notes:

Inf3 Computer Architecture - 2007-2008 40

Register Usage in MIPS ABI

Register Soft ABI function for thisNumber Name register

$0 always contains zero$1 at reserved for assem b ler

$2-$3 v0,v1 integ er funct ion result ( out) or stat ic link ( in)

$4-$7 a0-a3 f irst 4 integ er-type funct ion arg um ents

$8-$15 t0-t7 tem porary reg isters for expression evaluat ion

$16-$23 s0-s7 reg isters preserved across funct ion call

$24-$25 t8,t9 tem porary reg isters for expression evaluat ion

$28 gp g lobal po inter

$29 sp stack po inter

$30 fp f ram e po inter

$31 ra return address

The ABI gives well-understood functions to each of the registers in the general purpose registerset. There are obvious uses, such as the stack pointer. There are also three other special registers;the return address (ra), the frame pointer (fp) and the global pointer (gp). The ra register isassigned the return address when a function call is made. Software will put this value on thestack if the called function itself calls further functions. The fp register points to the base of thestack frame for the current function. Well see that in the next slide. The gp register, when used,points to a pool of global data that can be commonly referenced by all functions. This mayinclude variables with file or global scope.A function can use registers t0-t9 freely, but if it calls another function they may be overwritten.A function may not overwrite the contents of s0-s7, and must preserve their original contents if itwants to use them. Hence, s0-s7 are callee-saved, whereas t0-t9 are caller-saved registers.

2008-2009


Additional notes:


Functions and Stack Frames

foo (int i){ return bar (i);}

int bar (int n){ int a = n+1, b = n-1; return (a*b);}

Each function has a dynamicallyallocated stack frame

Frame contents normally accessed byaddresses that are relative to eitherthe stack pointer $sp or the framepointer $fp

Stack framefor foo

Stack framefor bar

free stack space

high addresses

low addresses

stackusuallygrowsdownwards

$sp

$fp

Stacks usually grow downwards in memory. Can you think why this might be?

2008-2009


Additional notes:


Anatomy of a Stack Frame

int foo (int i){ return bar (i);}

int bar (int n){ int a = n+1, b = n-1; return (a*b);}

Positive offsets from $fp = args Negative offsets from $fp = locals Not all portions of frame are needed by

all functions Callee save space holds previous $fp,

$ra, and any $s0-$7 that are modified byfunction bar

Stack framefor foo

Stack framefor bar

free stack space

high addresses

low addresses

$sp

$fpincoming args

callee-save space

local variables

outgoing args

The incoming arguments are values passed from foo to bar. Some of the args may be passedin registers and may not need space on the stack. The callee save space is a region that barcan use to save any of $s0-$s7 that may be modified in bar. Local variables in bar mayrequire some storage space on the stack. The outgoing args space is where args for functionsthat bar calls will be stored. This space will become the incoming args space of functionsthat bar calls (if any). If bar calls several functions, then the outgoing args space wouldtypically be the maximum space needed by any such function, allowing it to be allocatedonce.

2008-2009


Additional notes:


Call Return Sequencing

Call sequence Save caller-saved registers Copy arguments to stack or regs Call the function

Return sequence Restore caller-saved registers

Function Prologue Allocate callees stack frame Reposition frame pointer Save callee-saved registers

< execute body of function >

Function Epilogue Restore callee-saved registers Restore frame pointer De-allocated callees stack frame Return to caller

Exercise: take the foo() and bar() code shown earlier. Compile it using gcc on yourworkstation to produce an assembler file, and identify the four sequences listed in this slide.To do this type:

gcc O S o assembler.lis program.c

Where assembler.lis is the output where your assembler code will be produced, andprogram.c is the name of your C source file containing foo() and bar().

2008-2009


Additional notes:


Categorising Data by Location and Access

C programs contain several categories of data, according to where theylive and how they are created

The way addresses are computed depends on the category of access

StaticRead-only

StaticRead or Write

Dynamicmalloc(), free()

DynamicFunction scope

DynamicFunction scope

How created

$pc + signed offsetOften in a constant pool inthe .text section

Embeddedconstants

Addressing modeWhere data is locatedClassification

$gp + signed offset.bss sectionGlobal and staticvariables

GPR + offsetOn the heapDynamicallyallocatedvariables

$fp + negative offsetOn stack, below framepointer

Automaticvariables

$fp + positive offsetOn stack, above framepointer

Functionarguments

Each category of data, whether a function argument or an automatic variable, is allocated ina different way, and is therefore accessed in a different way. There are well-defined regions,such as the stack, the heap and the global data area. Each may have its own pointer (e.g. $sp,$gp) or may be accessed relative to $pc or a general-purpose register.

2008-2009


Additional notes:


Addressing Mode Frequency

Bottom-line: few addressing modes account for most of theinstructions in programs

H&PFig. 2.7

1

0

24

43

32

6

16

3

17

55

1

6

11

39

40

0 10 20 30 40 50 60

Indirect

Scaled

Register

Immediate

Displacement

Ad

dre

ss

ing

mo

de

Frequency of the addressing mode (%)

gcc

spice

TeX

In practice, compilers usually convert complex address calculations into unsigned integercomputations and then use very simple addressing modes based on computed addresses.Many memory references are to variables located on the stack. These always use [sp + offset]addressing modes, making the Displacement mode one of the most common.Try compiling a simple piece of C code into assembler and look at the addressing modes obtainedfor each variable accessed by the code.

Hint: gcc -S foo.c

2008-2009


Additional notes:


Displacement Addressing and Data Classification

Stack pointer and Frame pointer relative

Compiler can often eliminate frame pointer

Function must not call alloca()

5 to 10 bits of offset is sufficient in most cases

Register + offset Generic form for accessing via pointers

Multi-dimensional arrays require address calculations

PC relative addresses Useful for locating commonly-used constants in a pool of

constants located in the .text section

Exercise: add a call to alloca() in both foo() and bar() to see the effect on how the code getscompiled. Try man alloca if unsure how to use it.

2008-2009


Additional notes:


Floating point arithmetic

Usually based on the IEEE 754 floating point standard Useful when greater range of number is required

Integer: -2m-1 .. +2m-1-1 Floating point:

Binary DecimalSingle precision (2-2-23)127 ~ 1038.53

Double precision (2-2-52)1023 ~ 10308.25

See Hennessy & Patterson appendix for details of formats and operations Set aside an hour to read their appendix and become familiar with the overall

structure of the FP standard (dont memorise details you can always referback to the standard if you ever need to use it)

Key points for instruction sets: Integer and Floating Point never mixed in same operation Separate register sets for integer and FP operations are therefore common Floating point operations often optional or omitted from embedded processors Other ways to represent fractional values, e.g. fixed-point types

Follow the suggested reading on Hennessy and Patterson from the second bullet point. Makesummary notes here.

2008-2009


Additional notes:


Encoding the Instruction Set

How many bits per instruction? Fixed-length 32-bit RISC encoding Variable-length encoding (e.g. Intel x86) Compact 16-bit RISC encodings

ARM Thumb MIPS16 ARCompact

Formats define instruction groups with a common set ofoperands

An instruction format defines a set of operands that are used in common by a group ofinstructions. An instruction set is simply a collection of formats and the operations definedfor each format.

2008-2009


Additional notes:


Design consideration for ISA encoding

How compact is the encoding? Is the encoding orthogonal? How easy is it to extract operands unambiguously?

Register specifiers should be aligned in all formats (ideally) Implicitly defined registers will complicate decode How are the literals aligned and/or extended?

Are control transfers easily identifiable? If not, slow decoding of branches may increase CPI

Op-code assignment: Minimise Hamming distance between codes that perform

similar operations. Leads to simpler and faster decode logic

If you dont know what Hamming distance is, see page 193 of Andrew Tanenbaum,Computer Networks, 4th edition (a standard text in communications). A google search willalso find the definition. Think about why this is useful in instruction set design, and thenmake notes here as a reminder.

2008-2009


Additional notes:


MIPS 32-bit Instruction Formats

R-type (register to register) three register operands

most arithmetic, logical and shift instructions

I-type (register with immediate) instructions which use two registers and a constant

arithmetic/logical with immediate operand

load and store

branch instructions with relative branch distance

J-type (jump) jump instructions with a 26 bit address

At this point you will find it helpful to read Appendix B from Hennessy and Patterson (4/e)Putting it all together: The MIPS Architecture, p.B-32Appendix B is all about ISA design issues, using the MIPS architecture as a teachingvehicle.

2008-2009


Additional notes:


MIPS R-type instruction format

6 b its 6 b its5 b its 5 b its5 b its5 b its

opcode reg rs reg rt reg rd sham t funct

add $1, $2, $3

sll $4, $5, 16

special $2 $3 $1 add

special $5 $4 16 sll

Make your own list of instructions that follow this format.

2008-2009


Additional notes:


MIPS I-type instruction format

6 b its 1 6 b its5 b its 5 b its

opcode reg rs reg rt immediate value/addr

lw $2 $1 address offset

beq $4 $5 (PC - .L001) >> 2

lw $1, offset($2)

beq $4, $5, .L001

addi $1, $2, -10 addi $2 $1 0xfff6

Find more examples of instructions that follow this format and write them here.

2008-2009


Additional notes:


MIPS J-type instruction format

6 b its 2 6 b its

opcode address

call func call absolute func address >> 2

Again, find other examples of MIPS instructions that use this format.

2008-2009


Additional notes:


Code density optimisations

Prologue and Epilogue

Constant pools and PC relative loads

2-register formats

Restricted register sets

Non-orthogonality and implicit register operands

Read section B.10, Fallacies and Pitfalls, on page B-39 of Hennessy & Patterson. Makebrief notes here to remind you of the main points.

2008-2009


Additional notes:


Examples:

Special FeaturesGP registersInstructionSize

Instruction SetArchitecture

Freely-mixed compactand 32-bitinstructionsLong-immediate data

8 direct32 available

Mixed 16 and32 bit

ARCompact

push and pop forstack frame support

816 bitARM thumb

Some special ABIregisters stillaccessible

816 bitMIPS16

Most 32-bit architectures used in embedded systems have acquired a subset that is encodedin 16 bits. These instructions still operate on 32-bit data, but are encoded more efficiently.Generally speaking they all use two register operands rather than three, and also restrict thenumber of general purpose registers to 8. The ARCompact instruction set allows a freemixing of the original 32-bit instructions and the compact 16-bit instructions. This is notpermitted in ARM thumb or MIPS16, where each function must be compiled into the 32-bitor the 16-bit instruction set. Recently, ARM introduced the Thumb2 instruction set whichremoves that restriction.

2008-2009


Additional notes:


ARM Thumb Push and Pop instructions

Particularly effective for encoding function entry and exit code ina compact form.

Operand is a bit vector, with each bit specifying whether one ofthe callee saved registers should be pushed or popped.

Push may also save the link register (equiv. to MIPS $ra) Pop may then pop that value directly into PC, causing the

function to return to the caller. E.g.

push { r4, r5, r6, r7, lr }pop { r4, r5, r6, r7, pc }

These are multi-cycle operations, performing up to 5 memoryreads or writes.

Complex to implement, but highly effective in terms of codedensity Prologue and epilogue can account for 10-15% of the code space

Try to find other Instruction Set Architectures that support multi-register move operations.List them here:

2008-2009


Additional notes:


Instruction Frequency

Bottom-line: few instruction types account for most of theinstructions executed

96Total

1return

1call

4move register-register

5sub

6and

8add

12store

16compare

20conditional branch

22load

Fraction (%)80x86 instruction

H&PFig. 2.16

Bear in mind that each architecture is different, but that in general the frequencies shown aboveare representative of typical desktop applications.Embedded applications often see increasing frequencies of signal processing operations,especially 16-bit multiplications.

2008-2009


Additional notes:


IS and Performance

ISA Implementation: cycle time, pipelining, CPI, instruction length ISA Compiler: instruction scheduling, code motion, branch

optimizations, code generation, code size, register allocation Implementation instruction delays, register allocation, functional

units

ISA CompilerImplementation

Performance

This slide summarises the relationship between ISA and Compiler, and ISA and Implementation.

2008-2009


Additional notes:


IS Guidelines

Regularity: operations, data types, addressing modes, andregisters should be independent (orthogonal)

Primitives, not solutions: do not attempt to match HLLconstructs with special IS instructions

Simplify tradeoffs: make it easy for compiler to make choicesbased on estimated performance

Trust compiler: provide compiler with instructions andprimitives that exploit knowledge at compile-time

Instruction Sets can vary enormously from one architecture to another. However, within the set ofall RISC architectures there are actually few substantial differences.It is also worth noting that the number of distinct desktop architectures has been decreasing yearon year. In 2007 most new desktop systems shipped will have x86 processors. In the server spaceone can still find Sun SPARC and IBM PowerPC architectures.The embedded computing domain has a much greater diversity of architectures. Can you thinkwhy this might be?

2008-2009


Additional notes:


Improving CPU Performance (H&P 2.11; A.1; A3)

CPU performance can be computed by the CPUperformance equation: CPU time = IC x CPI x Clock time

To reduce CPU time: IC; clock period; CPI

ISA influences implementation, compiler optimizations, andtherefore performance

ISA must be an easy compiler target

No need to provide too many and too complexinstructions

Compiler has a significant role in improving performance

Essentially, to improve CPI we must reduce one of the three primary contributors, or else issuemore than one instruction per cycle (or both!)

2008-2009


Additional notes:


Program Structure: Basic-Blocks (BB)

Definition: straight-line code with single entry and single exit Boundaries:

Branches and jumps Calls and returns Targets of branches, jumps, calls, and returns

lw r2,0(r1) lw r3,4(r1) addi r3,r3,n bne r2,r3,Label2Label1: lw r4,8(r1) sub r2,r2,m beq r2,r0,label1Label2: add r1,r1,r3

BB1

BB2

BB3

BB1

BB2 BB3

Note: not all basic blocks are preceded by a branch. Contrive an example instruction sequence toillustrate this point here:

2008-2009


Additional notes:


Structure of Modern Compilers

Dependences

Front-end

Function

Language dependent;machine independent

Generate intermediaterepresentation

HLL code

High-leveloptimizations

IR

Somewhat language independentlargely machine independent

Procedure inlining;loop transformations

Globaloptimizer

Optimized IR

Mostly language independentmostly machine independent

Global + local optimizations;register allocation

Codegenerator

SSA

Language independentmachine dependent

Instruction selection;scheduling

Machine code

If you are taking a compiler course this year, these optimisations will be familiar. If not, you needto be at least aware of: 1. The difference between global and local optimisations 2. Machine dependent and machine independent optimisationsIf you need help with understanding the role of compilers, read section B.8, Crosscutting Issues:The Role of Compilers, in H&P (4/e) on page B-24

2008-2009


Additional notes:


Compiler Optimizations

High-level: at HLL source Procedure inlining

Local: within basic-block (BB) Common sub-expression elimination Constant propagation Stack height reduction

Global: across BBs Global common sub-expression elimination Copy propagation Code motion Induction variable elimination

Machine-dependent Strength reduction Pipeline scheduling Branch offset optimization

This slide summarises the essential concepts. A little reading around the subject andsupplementary note-taking will help with revision.

Download - MIPS Registers

Top Related