lec 9systems architecture1 systems architecture lecture 10: alternative instruction sets jeremy r....
TRANSCRIPT
Lec 9 Systems Architecture 1
Systems Architecture
Lecture 10: Alternative Instruction Sets
Jeremy R. Johnson Anatole D. RuslanovWilliam M. Mongan
Some or all figures from Computer Organization and Design: The Hardware/Software Approach, Third Edition, by David Patterson and John Hennessy, are copyrighted material (COPYRIGHT 2004 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED).
Lec 9 Systems Architecture 2
Introduction
• Objective: To compare MIPS to several alternative instruction set architectures and to better understand the design decisions made in MIPS.
• MIPS is an example of a RISC (Reduced Instruction Set Computer) architecture as compared to a CISC (Complex Instruction Set Computer) architecture.
• MIPS trades complexity of instructions and hence greater number of instructions, for a simpler implementation and shorter clock cycle or reduced number of clock cycles per instruction.
• Alternative instruction set, including recent versions of MIPS– Provide more powerful operations
– Aim at reducing the number of instructions executed
– The danger is a slower cycle time and/or a higher CPI
Lec 9 Systems Architecture 3
Characteristics of MIPS
• Load/Store architecture• General purpose register machine (32 registers)• ALU operations have 3 register operands (2 source + 1 dest)• 16 bit constants for immediate mode• Simple instruction set
– Simple branch operations (beq, bne)– Use register to set condition (e.g. slt)– Operations such as move, li, blt built from existing operations
• Uniform encoding– All instructions are 32-bits long– Opcode is always in the high-order 6 bits– 3 types of instruction formats– Register fields in the same place for all formats
Lec 9 Systems Architecture 4
Design Principles
• Simplicity favors regularity– uniform instruction length– all ALU operations have 3 register operands– register addresses in the same location for all instruction formats
• Smaller is faster– register architecture– small number of registers
• Good design demands good compromises– fixed length instructions and only 16 bit constants– several instruction formats but consistent length
• Make common cases fast– immediate addressing– 16 bit constants– only beq and bne
Lec 9 Systems Architecture 5
MIPS Addressing Modes• Immediate Addressing
– 16 bit constant from low order bits of instruction– addi $t0, $s0, 4
• Register Addressing– add $t0, $s0, $s1
• Base Addressing (displacement addressing)– 16-bit constant from low order bits of instruction plus base register– lw $t0, 16($sp)
• PC-Relative Addressing– (PC+4) + 16-bit address (word) from instruction– bne $s0, $s1, Target
• Pseudodirect Addressing– high order 4 bits of PC+4 concatenated with 26 bit word address - low order 26 bits
from instruction shifted 2 bits to the right– j Address
Lec 9 Systems Architecture 6
PowerPC• Similar to MIPS (RISC)• Two additional addressing modes
– indexed addressing - base register + index register• PowerPC: lw $t1, $a0+$s3• MIPS: add $t0, $a0,$s3
lw $t1, 0($t0)– Update addressing - displacement addressing + increment
• PowerPC: lwu $t0, 4($s3)• MIPS: lw $t0, 4($s3)
addi $s3, $s3, 4• Additional instructions
– separate counter register used for loops– PowerPC: bc Loop, ctr!=0– MIPS: Loop:
addi $t0, $t0, -1
bne $t0, $zero, Loop
Lec 9 Systems Architecture
Characteristics of 80x86 / IA-32• Evolved from 8086 (and backward compatible!!!)
• Register-Memory architecture
• 8 General purpose registers (evolved)
• Complex instruction set– Instruction lengths vary from 1 to 17 bytes long – A postbyte used to indicate addressing mode when not in opcode– Instructions may have many variants– Special instructions (move, push, pop, string, decimal)– Use condition codes – 7 data addressing modes – complex - with 8 or 32 bit displacement– Instructions can operate on 8, 16, or 32 bits (mode) changed with prefix– One operand must act as both a source and destination– One operand can come from memory
• Saving grace:– the most frequently used instructions are not too difficult to build– compilers avoid the portions of the architecture that are slow
April 19, 2023 Chapter 2 — Instructions: Language of the Computer
8
The Intel x86 ISA• Evolution with backward compatibility
– 8080 (1974): 8-bit microprocessor• Accumulator, plus 3 index-register pairs
– 8086 (1978): 16-bit extension to 8080• Complex instruction set (CISC)
– 8087 (1980): floating-point coprocessor• Adds FP instructions and register stack
– 80286 (1982): 24-bit addresses, MMU• Segmented memory mapping and protection
– 80386 (1985): 32-bit extension (now IA-32)• Additional addressing modes and operations• Paged memory mapping as well as segments
April 19, 2023 Chapter 2 — Instructions: Language of the Computer
9
The Intel x86 ISA• Further evolution…
– i486 (1989): pipelined, on-chip caches and FPU• Compatible competitors: AMD, Cyrix, …
– Pentium (1993): superscalar, 64-bit datapath• Later versions added MMX (Multi-Media eXtension)
instructions• The infamous FDIV bug
– Pentium Pro (1995), Pentium II (1997)• New microarchitecture (see Colwell, The Pentium
Chronicles)– Pentium III (1999)
• Added SSE (Streaming SIMD Extensions) and associated registers
– Pentium 4 (2001)• New microarchitecture• Added SSE2 instructions
April 19, 2023 Chapter 2 — Instructions: Language of the Computer
10
The Intel x86 ISA• And further…
– AMD64 (2003): extended architecture to 64 bits– EM64T – Extended Memory 64 Technology (2004)
• AMD64 adopted by Intel (with refinements)• Added SSE3 instructions
– Intel Core (2006)• Added SSE4 instructions, virtual machine support
– AMD64 (announced 2007): SSE5 instructions• Intel declined to follow, instead…
– Advanced Vector Extension (announced 2008)• Longer SSE registers, more instructions
• If Intel didn’t extend with compatibility, its competitors would!– Technical elegance ≠ market success
Lec 9 Systems Architecture 11
IA-32 Registers and Data Addressing
• Registers in the 32-bit subset that originated with 80386GPR 0
GPR 1
GPR 2
GPR 3
GPR 4
GPR 5
GPR 6
GPR 7
Code segment pointer
Stack segment pointer (top of stack)
Data segment pointer 0
Data segment pointer 1
Data segment pointer 2
Data segment pointer 3
Instruction pointer (PC)
Condition codes
Use
031Name
EAX
ECX
EDX
EBX
ESP
EBP
ESI
EDI
CS
SS
DS
ES
FS
GS
EIP
EFLAGS
Lec 9 Systems Architecture 12
IA-32 Addressing Modes
Mode Description MIPS equivalent
Register indirect address in register lw $s0, 0($s1)
Based mode with 8 or 32-bit displacement
address is contents of base register plus displacement
lw $s0, const($s1)
# const <= 16 bits
Base plus scaled index (not in MIPS)
Base + (2scale index)
mul $t0, $s2, 2scale
add $t0, $t0, $s1
lw $s0, 0($t0)
Base plus scaled index 8 or 32-bit plus displacement (not in MIPS)
Base + (2scale index) + displacement
mul $t0, $s2, 2scale
add $t0, $t0, $s1
lw $s0, const($t0)
# const <= 16 bits
There are some restrictions on register use ( not “general purpose”).
Lec 9 Systems Architecture 13
Typical IA-32 Instructions
Instruction Function
JE nameif equal(condition code) EIP = name, EIP - 128 < name < EIP + 128
JMP name EIP = name
CALL name SP = SP - 4; M[SP] = EIP + 5; EIP = name
MOVW EBX,[EDI+45] EBX = M[EDI+45]
PUSH ESI SP = SP - 4; M[SP] = ESI
POP EDI EDI = M[SP]; SP = SP + 4
ADD EAX,#6765 EAX = EAX + 6765
TEST EDX, #42 set condition code (flags) with EDX and 42
MOVSL M[EDI] = M[ESI]; EDI = EDI + 4; ESI = ESI + 4
Lec 9 Systems Architecture 14
IA-32 instruction Formats
• Typical formats: (note the different instruction lengths)a. JE EIP + displacement
b. CALL
c. MOV EBX, [EDI + 45]
d. PUSH ESI
e. ADD EAX, #6765
f. TEST EDX, #42
ImmediatePostbyteTEST
ADD
PUSH
MOV
CALL
JE
w
w ImmediateReg
Reg
wd Displacementr/m
Postbyte
Offset
DisplacementCondi-tion
4 4 8
8 32
6 81 1 8
5 3
4 323 1
7 321 8
April 19, 2023 Chapter 2 — Instructions: Language of the Computer
15
Implementing IA-32
• Complex instruction set makes implementation difficult– Hardware translates instructions to simpler microoperations
• Simple instructions: 1–1
• Complex instructions: 1–many
– Microengine similar to RISC
– Market share makes this economically viable
• Comparable performance to RISC– Compilers avoid complex instructions
Lec 9 Systems Architecture 16
Architecture Evolution
• Accumulator– EDSAC
• Extended Accumulator (special purpose register)– Intel 8086
• General Purpose Register– register-register (CDC 6600, MIPS, SPARC, PowerPC)– register-memory (Intel 80386, IBM 360)– memory-memory (VAX)
• Alternative– stack– high-level language
April 19, 2023 Chapter 2 — Instructions: Language of the Computer
17
Example: Clearing and Array
clear1(int array[], int size) { int i; for (i = 0; i < size; i += 1) array[i] = 0;}
clear2(int *array, int size) { int *p; for (p = &array[0]; p < &array[size]; p = p + 1) *p = 0;}
move $t0,$zero # i = 0loop1: sll $t1,$t0,2 # $t1 = i * 4 add $t2,$a0,$t1 # $t2 = # &array[i] sw $zero, 0($t2) # array[i] = 0 addi $t0,$t0,1 # i = i + 1 slt $t3,$t0,$a1 # $t3 = # (i < size) bne $t3,$zero,loop1 # if (…) # goto loop1
move $t0,$a0 # p = & array[0] sll $t1,$a1,2 # $t1 = size * 4 add $t2,$a0,$t1 # $t2 = # &array[size]loop2: sw $zero,0($t0) # Memory[p] = 0 addi $t0,$t0,4 # p = p + 4 slt $t3,$t0,$t2 # $t3 = #(p<&array[size]) bne $t3,$zero,loop2 # if (…) # goto loop2
April 19, 2023 Chapter 2 — Instructions: Language of the Computer
18
Comparison of Array vs. Ptr
• Multiply “strength reduced” to shift
• Array version requires shift to be inside loop– Part of index calculation for incremented i
– c.f. incrementing pointer
• Compiler can achieve same effect as manual use of pointers– Induction variable elimination
– Better to make program clearer and safer
April 19, 2023 Chapter 2 — Instructions: Language of the Computer
19
ARM & MIPS Similarities
• ARM: the most popular embedded core• Similar basic set of instructions to MIPS
ARM MIPS
Date announced 1985 1985
Instruction size 32 bits 32 bits
Address space 32-bit flat 32-bit flat
Data alignment Aligned Aligned
Data addressing modes 9 3
Registers 15 × 32-bit 31 × 32-bit
Input/outputMemory mapped
Memory mapped
April 19, 2023 Chapter 2 — Instructions: Language of the Computer
20
Compare and Branch in ARM
• Uses condition codes for result of an arithmetic/logical instruction
– Negative, zero, carry, overflow
– Compare instructions to set condition codes without keeping the result
• Each instruction can be conditional– Top 4 bits of instruction word: condition value
– Can avoid branches over single instructions
April 19, 2023 Chapter 2 — Instructions: Language of the Computer
22
Fallacies
• Powerful instruction higher performance– Fewer instructions required– But complex instructions are hard to implement
• May slow down all instructions, including simple ones
– Compilers are good at making fast code from simple instructions
• Use assembly code for high performance– But modern compilers are better at dealing with
modern processors– More lines of code more errors and less
productivity
April 19, 2023 Chapter 2 — Instructions: Language of the Computer
23
Fallacies
• Backward compatibility instruction set doesn’t change– But they do accrete more instructions
x86 instruction set
April 19, 2023 Chapter 2 — Instructions: Language of the Computer
24
Pitfalls
• Sequential words are not at sequential addresses– Increment by 4, not by 1!
• Keeping a pointer to an automatic variable after procedure returns
– e.g., passing pointer back via an argument
– Pointer becomes invalid when stack popped
April 19, 2023 Chapter 2 — Instructions: Language of the Computer
25
Concluding Remarks
• Design principles
1. Simplicity favors regularity
2. Smaller is faster
3. Make the common case fast
4. Good design demands good compromises
• Layers of software/hardware– Compiler, assembler, hardware
• MIPS: typical of RISC ISAs– c.f. x86
April 19, 2023 Chapter 2 — Instructions: Language of the Computer
26
Concluding Remarks
• Measure MIPS instruction executions in benchmark programs– Consider making the common case fast
– Consider compromises
Instruction class MIPS examples SPEC2006 Int SPEC2006 FP
Arithmetic add, sub, addi 16% 48%
Data transferlw, sw, lb, lbu, lh, lhu, sb, lui 35% 36%
Logicaland, or, nor, andi,
ori, sll, srl 12% 4%
Cond. Branchbeq, bne, slt, slti, sltiu 34% 8%
Jump j, jr, jal 2% 0%
Lec 9 Systems Architecture
• Instruction complexity is only one variable– lower instruction count vs. higher CPI / lower clock rate
• Design Principles:– simplicity favors regularity– smaller is faster– good design demands compromise– make the common case fast
• Instruction set architecture– a very important abstraction indeed!
Summary