Download - MIPS Registers
-
2008-2009
Informatics 3 - Computer Architecture 40
Additional notes:
Inf3 Computer Architecture - 2007-2008 40
Register Usage in MIPS ABI
Register Soft ABI function for thisNumber Name register
$0 always contains zero$1 at reserved for assem b ler
$2-$3 v0,v1 integ er funct ion result ( out) or stat ic link ( in)
$4-$7 a0-a3 f irst 4 integ er-type funct ion arg um ents
$8-$15 t0-t7 tem porary reg isters for expression evaluat ion
$16-$23 s0-s7 reg isters preserved across funct ion call
$24-$25 t8,t9 tem porary reg isters for expression evaluat ion
$28 gp g lobal po inter
$29 sp stack po inter
$30 fp f ram e po inter
$31 ra return address
The ABI gives well-understood functions to each of the registers in the general purpose registerset. There are obvious uses, such as the stack pointer. There are also three other special registers;the return address (ra), the frame pointer (fp) and the global pointer (gp). The ra register isassigned the return address when a function call is made. Software will put this value on thestack if the called function itself calls further functions. The fp register points to the base of thestack frame for the current function. Well see that in the next slide. The gp register, when used,points to a pool of global data that can be commonly referenced by all functions. This mayinclude variables with file or global scope.A function can use registers t0-t9 freely, but if it calls another function they may be overwritten.A function may not overwrite the contents of s0-s7, and must preserve their original contents if itwants to use them. Hence, s0-s7 are callee-saved, whereas t0-t9 are caller-saved registers.
-
2008-2009
Informatics 3 - Computer Architecture 41
Additional notes:
Inf3 Computer Architecture - 2007-2008 41
Functions and Stack Frames
foo (int i){ return bar (i);}
int bar (int n){ int a = n+1, b = n-1; return (a*b);}
Each function has a dynamicallyallocated stack frame
Frame contents normally accessed byaddresses that are relative to eitherthe stack pointer $sp or the framepointer $fp
Stack framefor foo
Stack framefor bar
free stack space
high addresses
low addresses
stackusuallygrowsdownwards
$sp
$fp
Stacks usually grow downwards in memory. Can you think why this might be?
-
2008-2009
Informatics 3 - Computer Architecture 42
Additional notes:
Inf3 Computer Architecture - 2007-2008 42
Anatomy of a Stack Frame
int foo (int i){ return bar (i);}
int bar (int n){ int a = n+1, b = n-1; return (a*b);}
Positive offsets from $fp = args Negative offsets from $fp = locals Not all portions of frame are needed by
all functions Callee save space holds previous $fp,
$ra, and any $s0-$7 that are modified byfunction bar
Stack framefor foo
Stack framefor bar
free stack space
high addresses
low addresses
$sp
$fpincoming args
callee-save space
local variables
outgoing args
The incoming arguments are values passed from foo to bar. Some of the args may be passedin registers and may not need space on the stack. The callee save space is a region that barcan use to save any of $s0-$s7 that may be modified in bar. Local variables in bar mayrequire some storage space on the stack. The outgoing args space is where args for functionsthat bar calls will be stored. This space will become the incoming args space of functionsthat bar calls (if any). If bar calls several functions, then the outgoing args space wouldtypically be the maximum space needed by any such function, allowing it to be allocatedonce.
-
2008-2009
Informatics 3 - Computer Architecture 43
Additional notes:
Inf3 Computer Architecture - 2007-2008 43
Call Return Sequencing
Call sequence Save caller-saved registers Copy arguments to stack or regs Call the function
Return sequence Restore caller-saved registers
Function Prologue Allocate callees stack frame Reposition frame pointer Save callee-saved registers
< execute body of function >
Function Epilogue Restore callee-saved registers Restore frame pointer De-allocated callees stack frame Return to caller
Exercise: take the foo() and bar() code shown earlier. Compile it using gcc on yourworkstation to produce an assembler file, and identify the four sequences listed in this slide.To do this type:
gcc O S o assembler.lis program.c
Where assembler.lis is the output where your assembler code will be produced, andprogram.c is the name of your C source file containing foo() and bar().
-
2008-2009
Informatics 3 - Computer Architecture 44
Additional notes:
Inf3 Computer Architecture - 2007-2008 44
Categorising Data by Location and Access
C programs contain several categories of data, according to where theylive and how they are created
The way addresses are computed depends on the category of access
StaticRead-only
StaticRead or Write
Dynamicmalloc(), free()
DynamicFunction scope
DynamicFunction scope
How created
$pc + signed offsetOften in a constant pool inthe .text section
Embeddedconstants
Addressing modeWhere data is locatedClassification
$gp + signed offset.bss sectionGlobal and staticvariables
GPR + offsetOn the heapDynamicallyallocatedvariables
$fp + negative offsetOn stack, below framepointer
Automaticvariables
$fp + positive offsetOn stack, above framepointer
Functionarguments
Each category of data, whether a function argument or an automatic variable, is allocated ina different way, and is therefore accessed in a different way. There are well-defined regions,such as the stack, the heap and the global data area. Each may have its own pointer (e.g. $sp,$gp) or may be accessed relative to $pc or a general-purpose register.
-
2008-2009
Informatics 3 - Computer Architecture 45
Additional notes:
Inf3 Computer Architecture - 2007-2008 45
Addressing Mode Frequency
Bottom-line: few addressing modes account for most of theinstructions in programs
H&PFig. 2.7
1
0
24
43
32
6
16
3
17
55
1
6
11
39
40
0 10 20 30 40 50 60
Indirect
Scaled
Register
Immediate
Displacement
Ad
dre
ss
ing
mo
de
Frequency of the addressing mode (%)
gcc
spice
TeX
In practice, compilers usually convert complex address calculations into unsigned integercomputations and then use very simple addressing modes based on computed addresses.Many memory references are to variables located on the stack. These always use [sp + offset]addressing modes, making the Displacement mode one of the most common.Try compiling a simple piece of C code into assembler and look at the addressing modes obtainedfor each variable accessed by the code.
Hint: gcc -S foo.c
-
2008-2009
Informatics 3 - Computer Architecture 46
Additional notes:
Inf3 Computer Architecture - 2007-2008 46
Displacement Addressing and Data Classification
Stack pointer and Frame pointer relative
Compiler can often eliminate frame pointer
Function must not call alloca()
5 to 10 bits of offset is sufficient in most cases
Register + offset Generic form for accessing via pointers
Multi-dimensional arrays require address calculations
PC relative addresses Useful for locating commonly-used constants in a pool of
constants located in the .text section
Exercise: add a call to alloca() in both foo() and bar() to see the effect on how the code getscompiled. Try man alloca if unsure how to use it.
-
2008-2009
Informatics 3 - Computer Architecture 47
Additional notes:
Inf3 Computer Architecture - 2007-2008 47
Floating point arithmetic
Usually based on the IEEE 754 floating point standard Useful when greater range of number is required
Integer: -2m-1 .. +2m-1-1 Floating point:
Binary DecimalSingle precision (2-2-23)127 ~ 1038.53
Double precision (2-2-52)1023 ~ 10308.25
See Hennessy & Patterson appendix for details of formats and operations Set aside an hour to read their appendix and become familiar with the overall
structure of the FP standard (dont memorise details you can always referback to the standard if you ever need to use it)
Key points for instruction sets: Integer and Floating Point never mixed in same operation Separate register sets for integer and FP operations are therefore common Floating point operations often optional or omitted from embedded processors Other ways to represent fractional values, e.g. fixed-point types
Follow the suggested reading on Hennessy and Patterson from the second bullet point. Makesummary notes here.
-
2008-2009
Informatics 3 - Computer Architecture 48
Additional notes:
Inf3 Computer Architecture - 2007-2008 48
Encoding the Instruction Set
How many bits per instruction? Fixed-length 32-bit RISC encoding Variable-length encoding (e.g. Intel x86) Compact 16-bit RISC encodings
ARM Thumb MIPS16 ARCompact
Formats define instruction groups with a common set ofoperands
An instruction format defines a set of operands that are used in common by a group ofinstructions. An instruction set is simply a collection of formats and the operations definedfor each format.
-
2008-2009
Informatics 3 - Computer Architecture 49
Additional notes:
Inf3 Computer Architecture - 2007-2008 49
Design consideration for ISA encoding
How compact is the encoding? Is the encoding orthogonal? How easy is it to extract operands unambiguously?
Register specifiers should be aligned in all formats (ideally) Implicitly defined registers will complicate decode How are the literals aligned and/or extended?
Are control transfers easily identifiable? If not, slow decoding of branches may increase CPI
Op-code assignment: Minimise Hamming distance between codes that perform
similar operations. Leads to simpler and faster decode logic
If you dont know what Hamming distance is, see page 193 of Andrew Tanenbaum,Computer Networks, 4th edition (a standard text in communications). A google search willalso find the definition. Think about why this is useful in instruction set design, and thenmake notes here as a reminder.
-
2008-2009
Informatics 3 - Computer Architecture 50
Additional notes:
Inf3 Computer Architecture - 2007-2008 50
MIPS 32-bit Instruction Formats
R-type (register to register) three register operands
most arithmetic, logical and shift instructions
I-type (register with immediate) instructions which use two registers and a constant
arithmetic/logical with immediate operand
load and store
branch instructions with relative branch distance
J-type (jump) jump instructions with a 26 bit address
At this point you will find it helpful to read Appendix B from Hennessy and Patterson (4/e)Putting it all together: The MIPS Architecture, p.B-32Appendix B is all about ISA design issues, using the MIPS architecture as a teachingvehicle.
-
2008-2009
Informatics 3 - Computer Architecture 51
Additional notes:
Inf3 Computer Architecture - 2007-2008 51
MIPS R-type instruction format
6 b its 6 b its5 b its 5 b its5 b its5 b its
opcode reg rs reg rt reg rd sham t funct
add $1, $2, $3
sll $4, $5, 16
special $2 $3 $1 add
special $5 $4 16 sll
Make your own list of instructions that follow this format.
-
2008-2009
Informatics 3 - Computer Architecture 52
Additional notes:
Inf3 Computer Architecture - 2007-2008 52
MIPS I-type instruction format
6 b its 1 6 b its5 b its 5 b its
opcode reg rs reg rt immediate value/addr
lw $2 $1 address offset
beq $4 $5 (PC - .L001) >> 2
lw $1, offset($2)
beq $4, $5, .L001
addi $1, $2, -10 addi $2 $1 0xfff6
Find more examples of instructions that follow this format and write them here.
-
2008-2009
Informatics 3 - Computer Architecture 53
Additional notes:
Inf3 Computer Architecture - 2007-2008 53
MIPS J-type instruction format
6 b its 2 6 b its
opcode address
call func call absolute func address >> 2
Again, find other examples of MIPS instructions that use this format.
-
2008-2009
Informatics 3 - Computer Architecture 54
Additional notes:
Inf3 Computer Architecture - 2007-2008 54
Code density optimisations
Prologue and Epilogue
Constant pools and PC relative loads
2-register formats
Restricted register sets
Non-orthogonality and implicit register operands
Read section B.10, Fallacies and Pitfalls, on page B-39 of Hennessy & Patterson. Makebrief notes here to remind you of the main points.
-
2008-2009
Informatics 3 - Computer Architecture 55
Additional notes:
Inf3 Computer Architecture - 2007-2008 55
Examples:
Special FeaturesGP registersInstructionSize
Instruction SetArchitecture
Freely-mixed compactand 32-bitinstructionsLong-immediate data
8 direct32 available
Mixed 16 and32 bit
ARCompact
push and pop forstack frame support
816 bitARM thumb
Some special ABIregisters stillaccessible
816 bitMIPS16
Most 32-bit architectures used in embedded systems have acquired a subset that is encodedin 16 bits. These instructions still operate on 32-bit data, but are encoded more efficiently.Generally speaking they all use two register operands rather than three, and also restrict thenumber of general purpose registers to 8. The ARCompact instruction set allows a freemixing of the original 32-bit instructions and the compact 16-bit instructions. This is notpermitted in ARM thumb or MIPS16, where each function must be compiled into the 32-bitor the 16-bit instruction set. Recently, ARM introduced the Thumb2 instruction set whichremoves that restriction.
-
2008-2009
Informatics 3 - Computer Architecture 56
Additional notes:
Inf3 Computer Architecture - 2007-2008 56
ARM Thumb Push and Pop instructions
Particularly effective for encoding function entry and exit code ina compact form.
Operand is a bit vector, with each bit specifying whether one ofthe callee saved registers should be pushed or popped.
Push may also save the link register (equiv. to MIPS $ra) Pop may then pop that value directly into PC, causing the
function to return to the caller. E.g.
push { r4, r5, r6, r7, lr }pop { r4, r5, r6, r7, pc }
These are multi-cycle operations, performing up to 5 memoryreads or writes.
Complex to implement, but highly effective in terms of codedensity Prologue and epilogue can account for 10-15% of the code space
Try to find other Instruction Set Architectures that support multi-register move operations.List them here:
-
2008-2009
Informatics 3 - Computer Architecture 57
Additional notes:
Inf3 Computer Architecture - 2007-2008 57
Instruction Frequency
Bottom-line: few instruction types account for most of theinstructions executed
96Total
1return
1call
4move register-register
5sub
6and
8add
12store
16compare
20conditional branch
22load
Fraction (%)80x86 instruction
H&PFig. 2.16
Bear in mind that each architecture is different, but that in general the frequencies shown aboveare representative of typical desktop applications.Embedded applications often see increasing frequencies of signal processing operations,especially 16-bit multiplications.
-
2008-2009
Informatics 3 - Computer Architecture 58
Additional notes:
Inf3 Computer Architecture - 2007-2008 58
IS and Performance
ISA Implementation: cycle time, pipelining, CPI, instruction length ISA Compiler: instruction scheduling, code motion, branch
optimizations, code generation, code size, register allocation Implementation instruction delays, register allocation, functional
units
ISA CompilerImplementation
Performance
This slide summarises the relationship between ISA and Compiler, and ISA and Implementation.
-
2008-2009
Informatics 3 - Computer Architecture 59
Additional notes:
Inf3 Computer Architecture - 2007-2008 59
IS Guidelines
Regularity: operations, data types, addressing modes, andregisters should be independent (orthogonal)
Primitives, not solutions: do not attempt to match HLLconstructs with special IS instructions
Simplify tradeoffs: make it easy for compiler to make choicesbased on estimated performance
Trust compiler: provide compiler with instructions andprimitives that exploit knowledge at compile-time
Instruction Sets can vary enormously from one architecture to another. However, within the set ofall RISC architectures there are actually few substantial differences.It is also worth noting that the number of distinct desktop architectures has been decreasing yearon year. In 2007 most new desktop systems shipped will have x86 processors. In the server spaceone can still find Sun SPARC and IBM PowerPC architectures.The embedded computing domain has a much greater diversity of architectures. Can you thinkwhy this might be?
-
2008-2009
Informatics 3 - Computer Architecture 60
Additional notes:
Inf3 Computer Architecture - 2007-2008 60
Improving CPU Performance (H&P 2.11; A.1; A3)
CPU performance can be computed by the CPUperformance equation: CPU time = IC x CPI x Clock time
To reduce CPU time: IC; clock period; CPI
ISA influences implementation, compiler optimizations, andtherefore performance
ISA must be an easy compiler target
No need to provide too many and too complexinstructions
Compiler has a significant role in improving performance
Essentially, to improve CPI we must reduce one of the three primary contributors, or else issuemore than one instruction per cycle (or both!)
-
2008-2009
Informatics 3 - Computer Architecture 61
Additional notes:
Inf3 Computer Architecture - 2007-2008 61
Program Structure: Basic-Blocks (BB)
Definition: straight-line code with single entry and single exit Boundaries:
Branches and jumps Calls and returns Targets of branches, jumps, calls, and returns
lw r2,0(r1) lw r3,4(r1) addi r3,r3,n bne r2,r3,Label2Label1: lw r4,8(r1) sub r2,r2,m beq r2,r0,label1Label2: add r1,r1,r3
BB1
BB2
BB3
BB1
BB2 BB3
Note: not all basic blocks are preceded by a branch. Contrive an example instruction sequence toillustrate this point here:
-
2008-2009
Informatics 3 - Computer Architecture 62
Additional notes:
Inf3 Computer Architecture - 2007-2008 62
Structure of Modern Compilers
Dependences
Front-end
Function
Language dependent;machine independent
Generate intermediaterepresentation
HLL code
High-leveloptimizations
IR
Somewhat language independentlargely machine independent
Procedure inlining;loop transformations
Globaloptimizer
Optimized IR
Mostly language independentmostly machine independent
Global + local optimizations;register allocation
Codegenerator
SSA
Language independentmachine dependent
Instruction selection;scheduling
Machine code
If you are taking a compiler course this year, these optimisations will be familiar. If not, you needto be at least aware of: 1. The difference between global and local optimisations 2. Machine dependent and machine independent optimisationsIf you need help with understanding the role of compilers, read section B.8, Crosscutting Issues:The Role of Compilers, in H&P (4/e) on page B-24
-
2008-2009
Informatics 3 - Computer Architecture 63
Additional notes:
Inf3 Computer Architecture - 2007-2008 63
Compiler Optimizations
High-level: at HLL source Procedure inlining
Local: within basic-block (BB) Common sub-expression elimination Constant propagation Stack height reduction
Global: across BBs Global common sub-expression elimination Copy propagation Code motion Induction variable elimination
Machine-dependent Strength reduction Pipeline scheduling Branch offset optimization
This slide summarises the essential concepts. A little reading around the subject andsupplementary note-taking will help with revision.