lec 9systems architecture1 systems architecture lecture 10: alternative instruction sets jeremy r....

27
Lec 9 Systems Architecture 1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or all figures from Computer Organization and Design: The Hardware/Software Approach, Third Edition, by David Patterson and John Hennessy, are copyrighted material (COPYRIGHT 2004 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED).

Upload: betty-weaver

Post on 29-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Lec 9 Systems Architecture 1

Systems Architecture

Lecture 10: Alternative Instruction Sets

Jeremy R. Johnson Anatole D. RuslanovWilliam M. Mongan

Some or all figures from Computer Organization and Design: The Hardware/Software Approach, Third Edition, by David Patterson and John Hennessy, are copyrighted material (COPYRIGHT 2004 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED).

Lec 9 Systems Architecture 2

Introduction

• Objective: To compare MIPS to several alternative instruction set architectures and to better understand the design decisions made in MIPS.

• MIPS is an example of a RISC (Reduced Instruction Set Computer) architecture as compared to a CISC (Complex Instruction Set Computer) architecture.

• MIPS trades complexity of instructions and hence greater number of instructions, for a simpler implementation and shorter clock cycle or reduced number of clock cycles per instruction.

• Alternative instruction set, including recent versions of MIPS– Provide more powerful operations

– Aim at reducing the number of instructions executed

– The danger is a slower cycle time and/or a higher CPI

Lec 9 Systems Architecture 3

Characteristics of MIPS

• Load/Store architecture• General purpose register machine (32 registers)• ALU operations have 3 register operands (2 source + 1 dest)• 16 bit constants for immediate mode• Simple instruction set

– Simple branch operations (beq, bne)– Use register to set condition (e.g. slt)– Operations such as move, li, blt built from existing operations

• Uniform encoding– All instructions are 32-bits long– Opcode is always in the high-order 6 bits– 3 types of instruction formats– Register fields in the same place for all formats

Lec 9 Systems Architecture 4

Design Principles

• Simplicity favors regularity– uniform instruction length– all ALU operations have 3 register operands– register addresses in the same location for all instruction formats

• Smaller is faster– register architecture– small number of registers

• Good design demands good compromises– fixed length instructions and only 16 bit constants– several instruction formats but consistent length

• Make common cases fast– immediate addressing– 16 bit constants– only beq and bne

Lec 9 Systems Architecture 5

MIPS Addressing Modes• Immediate Addressing

– 16 bit constant from low order bits of instruction– addi $t0, $s0, 4

• Register Addressing– add $t0, $s0, $s1

• Base Addressing (displacement addressing)– 16-bit constant from low order bits of instruction plus base register– lw $t0, 16($sp)

• PC-Relative Addressing– (PC+4) + 16-bit address (word) from instruction– bne $s0, $s1, Target

• Pseudodirect Addressing– high order 4 bits of PC+4 concatenated with 26 bit word address - low order 26 bits

from instruction shifted 2 bits to the right– j Address

Lec 9 Systems Architecture 6

PowerPC• Similar to MIPS (RISC)• Two additional addressing modes

– indexed addressing - base register + index register• PowerPC: lw $t1, $a0+$s3• MIPS: add $t0, $a0,$s3

lw $t1, 0($t0)– Update addressing - displacement addressing + increment

• PowerPC: lwu $t0, 4($s3)• MIPS: lw $t0, 4($s3)

addi $s3, $s3, 4• Additional instructions

– separate counter register used for loops– PowerPC: bc Loop, ctr!=0– MIPS: Loop:

addi $t0, $t0, -1

bne $t0, $zero, Loop

Lec 9 Systems Architecture

Characteristics of 80x86 / IA-32• Evolved from 8086 (and backward compatible!!!)

• Register-Memory architecture

• 8 General purpose registers (evolved)

• Complex instruction set– Instruction lengths vary from 1 to 17 bytes long – A postbyte used to indicate addressing mode when not in opcode– Instructions may have many variants– Special instructions (move, push, pop, string, decimal)– Use condition codes – 7 data addressing modes – complex - with 8 or 32 bit displacement– Instructions can operate on 8, 16, or 32 bits (mode) changed with prefix– One operand must act as both a source and destination– One operand can come from memory

• Saving grace:– the most frequently used instructions are not too difficult to build– compilers avoid the portions of the architecture that are slow

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

8

The Intel x86 ISA• Evolution with backward compatibility

– 8080 (1974): 8-bit microprocessor• Accumulator, plus 3 index-register pairs

– 8086 (1978): 16-bit extension to 8080• Complex instruction set (CISC)

– 8087 (1980): floating-point coprocessor• Adds FP instructions and register stack

– 80286 (1982): 24-bit addresses, MMU• Segmented memory mapping and protection

– 80386 (1985): 32-bit extension (now IA-32)• Additional addressing modes and operations• Paged memory mapping as well as segments

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

9

The Intel x86 ISA• Further evolution…

– i486 (1989): pipelined, on-chip caches and FPU• Compatible competitors: AMD, Cyrix, …

– Pentium (1993): superscalar, 64-bit datapath• Later versions added MMX (Multi-Media eXtension)

instructions• The infamous FDIV bug

– Pentium Pro (1995), Pentium II (1997)• New microarchitecture (see Colwell, The Pentium

Chronicles)– Pentium III (1999)

• Added SSE (Streaming SIMD Extensions) and associated registers

– Pentium 4 (2001)• New microarchitecture• Added SSE2 instructions

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

10

The Intel x86 ISA• And further…

– AMD64 (2003): extended architecture to 64 bits– EM64T – Extended Memory 64 Technology (2004)

• AMD64 adopted by Intel (with refinements)• Added SSE3 instructions

– Intel Core (2006)• Added SSE4 instructions, virtual machine support

– AMD64 (announced 2007): SSE5 instructions• Intel declined to follow, instead…

– Advanced Vector Extension (announced 2008)• Longer SSE registers, more instructions

• If Intel didn’t extend with compatibility, its competitors would!– Technical elegance ≠ market success

Lec 9 Systems Architecture 11

IA-32 Registers and Data Addressing

• Registers in the 32-bit subset that originated with 80386GPR 0

GPR 1

GPR 2

GPR 3

GPR 4

GPR 5

GPR 6

GPR 7

Code segment pointer

Stack segment pointer (top of stack)

Data segment pointer 0

Data segment pointer 1

Data segment pointer 2

Data segment pointer 3

Instruction pointer (PC)

Condition codes

Use

031Name

EAX

ECX

EDX

EBX

ESP

EBP

ESI

EDI

CS

SS

DS

ES

FS

GS

EIP

EFLAGS

Lec 9 Systems Architecture 12

IA-32 Addressing Modes

Mode Description MIPS equivalent

Register indirect address in register lw $s0, 0($s1)

Based mode with 8 or 32-bit displacement

address is contents of base register plus displacement

lw $s0, const($s1)

# const <= 16 bits

Base plus scaled index (not in MIPS)

Base + (2scale index)

mul $t0, $s2, 2scale

add $t0, $t0, $s1

lw $s0, 0($t0)

Base plus scaled index 8 or 32-bit plus displacement (not in MIPS)

Base + (2scale index) + displacement

mul $t0, $s2, 2scale

add $t0, $t0, $s1

lw $s0, const($t0)

# const <= 16 bits

There are some restrictions on register use ( not “general purpose”).

Lec 9 Systems Architecture 13

Typical IA-32 Instructions

Instruction Function

JE nameif equal(condition code) EIP = name, EIP - 128 < name < EIP + 128

JMP name EIP = name

CALL name SP = SP - 4; M[SP] = EIP + 5; EIP = name

MOVW EBX,[EDI+45] EBX = M[EDI+45]

PUSH ESI SP = SP - 4; M[SP] = ESI

POP EDI EDI = M[SP]; SP = SP + 4

ADD EAX,#6765 EAX = EAX + 6765

TEST EDX, #42 set condition code (flags) with EDX and 42

MOVSL M[EDI] = M[ESI]; EDI = EDI + 4; ESI = ESI + 4

Lec 9 Systems Architecture 14

IA-32 instruction Formats

• Typical formats: (note the different instruction lengths)a. JE EIP + displacement

b. CALL

c. MOV EBX, [EDI + 45]

d. PUSH ESI

e. ADD EAX, #6765

f. TEST EDX, #42

ImmediatePostbyteTEST

ADD

PUSH

MOV

CALL

JE

w

w ImmediateReg

Reg

wd Displacementr/m

Postbyte

Offset

DisplacementCondi-tion

4 4 8

8 32

6 81 1 8

5 3

4 323 1

7 321 8

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

15

Implementing IA-32

• Complex instruction set makes implementation difficult– Hardware translates instructions to simpler microoperations

• Simple instructions: 1–1

• Complex instructions: 1–many

– Microengine similar to RISC

– Market share makes this economically viable

• Comparable performance to RISC– Compilers avoid complex instructions

Lec 9 Systems Architecture 16

Architecture Evolution

• Accumulator– EDSAC

• Extended Accumulator (special purpose register)– Intel 8086

• General Purpose Register– register-register (CDC 6600, MIPS, SPARC, PowerPC)– register-memory (Intel 80386, IBM 360)– memory-memory (VAX)

• Alternative– stack– high-level language

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

17

Example: Clearing and Array

clear1(int array[], int size) { int i; for (i = 0; i < size; i += 1) array[i] = 0;}

clear2(int *array, int size) { int *p; for (p = &array[0]; p < &array[size]; p = p + 1) *p = 0;}

move $t0,$zero # i = 0loop1: sll $t1,$t0,2 # $t1 = i * 4 add $t2,$a0,$t1 # $t2 = # &array[i] sw $zero, 0($t2) # array[i] = 0 addi $t0,$t0,1 # i = i + 1 slt $t3,$t0,$a1 # $t3 = # (i < size) bne $t3,$zero,loop1 # if (…) # goto loop1

move $t0,$a0 # p = & array[0] sll $t1,$a1,2 # $t1 = size * 4 add $t2,$a0,$t1 # $t2 = # &array[size]loop2: sw $zero,0($t0) # Memory[p] = 0 addi $t0,$t0,4 # p = p + 4 slt $t3,$t0,$t2 # $t3 = #(p<&array[size]) bne $t3,$zero,loop2 # if (…) # goto loop2

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

18

Comparison of Array vs. Ptr

• Multiply “strength reduced” to shift

• Array version requires shift to be inside loop– Part of index calculation for incremented i

– c.f. incrementing pointer

• Compiler can achieve same effect as manual use of pointers– Induction variable elimination

– Better to make program clearer and safer

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

19

ARM & MIPS Similarities

• ARM: the most popular embedded core• Similar basic set of instructions to MIPS

ARM MIPS

Date announced 1985 1985

Instruction size 32 bits 32 bits

Address space 32-bit flat 32-bit flat

Data alignment Aligned Aligned

Data addressing modes 9 3

Registers 15 × 32-bit 31 × 32-bit

Input/outputMemory mapped

Memory mapped

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

20

Compare and Branch in ARM

• Uses condition codes for result of an arithmetic/logical instruction

– Negative, zero, carry, overflow

– Compare instructions to set condition codes without keeping the result

• Each instruction can be conditional– Top 4 bits of instruction word: condition value

– Can avoid branches over single instructions

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

21

Instruction Encoding

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

22

Fallacies

• Powerful instruction higher performance– Fewer instructions required– But complex instructions are hard to implement

• May slow down all instructions, including simple ones

– Compilers are good at making fast code from simple instructions

• Use assembly code for high performance– But modern compilers are better at dealing with

modern processors– More lines of code more errors and less

productivity

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

23

Fallacies

• Backward compatibility instruction set doesn’t change– But they do accrete more instructions

x86 instruction set

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

24

Pitfalls

• Sequential words are not at sequential addresses– Increment by 4, not by 1!

• Keeping a pointer to an automatic variable after procedure returns

– e.g., passing pointer back via an argument

– Pointer becomes invalid when stack popped

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

25

Concluding Remarks

• Design principles

1. Simplicity favors regularity

2. Smaller is faster

3. Make the common case fast

4. Good design demands good compromises

• Layers of software/hardware– Compiler, assembler, hardware

• MIPS: typical of RISC ISAs– c.f. x86

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

26

Concluding Remarks

• Measure MIPS instruction executions in benchmark programs– Consider making the common case fast

– Consider compromises

Instruction class MIPS examples SPEC2006 Int SPEC2006 FP

Arithmetic add, sub, addi 16% 48%

Data transferlw, sw, lb, lbu, lh, lhu, sb, lui 35% 36%

Logicaland, or, nor, andi,

ori, sll, srl 12% 4%

Cond. Branchbeq, bne, slt, slti, sltiu 34% 8%

Jump j, jr, jal 2% 0%

Lec 9 Systems Architecture

• Instruction complexity is only one variable– lower instruction count vs. higher CPI / lower clock rate

• Design Principles:– simplicity favors regularity– smaller is faster– good design demands compromise– make the common case fast

• Instruction set architecture– a very important abstraction indeed!

Summary