4-inst set architecture

8/2/2019 4-Inst Set Architecture

1/43


2/43

4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 2

Instruction Set Architecture

Instruction set architecture is based on thestructure of a computer i.e. the description ofthe CPU in terms of Registers, Addressabilityand various Arithmetic / Control and Storeoperations etc.

Assembly / Machine language programmermust understand ISA of target processor toprogram for it.


3/43


Instruction Set Architecture

instruction set

High level language code : C, C++, Java,Fortan,

hardware

Assembly language code: architecture specific statements

Machine language code: architecture specific bit patterns

software

compiler

assembler

The programs written in any higher level languageeventually get converted to assembly level containinginstructions in mnemonics of instruction set.

The Assembler converts these into machine languagebefore execution


4/43


ISA Metrics

Orthogonally All operand modes are available with any data type or

instruction type.

Completeness

Support for a wide range of operations and target

applications

Regularity

No overloading for the meanings of instruction fields

Streamlined

Resource needs easily determined

Ease of assembly language programming

Ease of implementation


5/43


Instruction Set Design Issues

Instruction set design issues include: Where are operands stored?

registers, memory, stack, accumulator

How many explicit operands are there?

0, 1, 2, or 3 How is the operand location specified?

register, immediate, indirect, . . .

What type & size of operands are supported?

byte, int, float, double, string, vector. . . What operations are supported?

add, sub, mul, move, compare . . .


6/43


Evolution of Instruction Sets

Single Accumulator (EDSAC 1950)

Accumulator + Index Registers

(Manchester Mark I, IBM 700 series 1953)

Separation of Programming Modelfrom Implementation

High-level Language Based Concept of a Family

(B5000 1963) (IBM 360 1964)

General Purpose Register Machines

Complex Instruction Sets Load/Store Architecture

RISC

(Vax, Intel 8086 1977-80) (CDC 6600, Cray 1 1963-76)

(Mips,Sparc,88000,IBM RS6000, . . .1987+)


7/43


Classifying ISAs

Accumulator (before 1960):1 address add A acc acc + mem[A]

Stack (1960s to 1970s):0 address add tos tos + next

Memory-Memory (1970s to 1980s):2 address add A, B mem[A] mem[A] + mem[B]3 address add A, B, C mem[A] mem[B] + mem[C]

Register-Memory (1970s to present):2 address add R1, A R1 R1 + mem[A]

load R1, A R1 mem[A]

Register-Register (Load/Store) (1960s to present):3 address add R1, R2, R3 R1 R2 + R3

load R1, R2 R1 mem[R2]store R1, R2 mem[R1] R2


8/43


Types of Addressing Modes (VAX)

Addressing Mode Example Action

1. Register direct Add R4, R3 R4


9/43


Types of Addressing Modes Intel Instruction Set

Register Instructions involving data manipulation through registers

Immediate Involves immediate values contained within the instructionDirect Transfer data to/from memory location to memory/ registerRegister Indirect Transfer a data byte/word to a location whose address is

specified in a register e.g. [Bx]Use of Byte PTR, Word PTR, DWord PTR specifies boundary

of data.Base + IndexIndirect

MOV AX, [BX+SI]Relative MOV AX, (BX+4)Base relative

plus index MOV AX, (BX+SI+4)Scaled Index MOV AX,[AX+4*BX]


10/43


Instruction Encoding

Variable Size

Instruction length varies based on opcode and addressspecifiers

For example, VAX instructions vary between 1 and 53 bytes,while x86 instruction vary between 1 and 17 bytes.

Good source code density, but difficult to decode and pipeline

Fixed Size

Only a single size for all instructions

For example, DLX, MIPS, Power PC, Sparc all have 32 bitinstructions

Not as good code density, but easier to decode and pipeline

Hybrid Size

Have multiple format lengths specified by the opcode

For example, IBM 360/370

Compromise between code density and ease in decoding


11/43


DLX Architecture

Introduced by Hennessey and Patterson in 1990 Derived from many different instruction set architectures

from MIPS, Sun, IBM, Intel, HP, AMD, etc.

DLX is a typical RISC architecture.

32-bit fixed length instructions 3 instruction formats

Load/store architecture

Simple branch conditions (no condition codes)

DLX registers 32 32-bit general-purpose registers (R0 = 0)

32 32-bit (or 16 64-bit) floating point registers

Special purpose registers (e.g., FP Status and PC)


12/43


DLX Design Decisions

DLX is based on the following design decisions Use general purpose registers with a load-store architecture

Support commonly used addressing modes

displacement, immediate, and register deferred

Support simple instructions that occur frequently load, store, add, subtract, move, and, shift, compare

equal, branch, jump, call, and return

Support commonly required data sizes

8 (byte), 16 (half word), and 32-bit (word) integers

32 (float) and 64-bit (double) floating point Use fixed length instructions that are easy to decode

Provide plenty of general purpose registers and separatefloating point registers


13/43


DLX Instruction Formats

Op

31 26 01516202125

rs1 rd immediate

Op

31 26 025

Op

31 26 01516202125

rs1 rs2

offset added to PC

rd

(a) Register-Register (R-type) ADD R1, R2, R3

561011

(b) Register-Immediate (I-type) SUB R1, R2, #3

(c) Jump / Call (J-type) JUMP end

function

(ALU immediate operations, loads and stores, conditional branch, jump )

(jump, jump and link, trap and return from exception)

(ALL reg. operations, read/write special registers and moves)


14/43


Intel 80x86 Integer Registers


15/43


X86 Operand Types

x86 instructions typically have two operands,where one operand is both a source and adestination operand.

Possible combinations include

Source/destination type Second source typeRegister Register

Register Immediate

Register Memory

Memory RegisterMemory Immediate

No memory-memory or immediate-immediate

Immediate can be 8, 16, or 32 bits


16/43


Intel 80x86 Floating Point Registers


17/43


80x86 Instructions

Data movement

(move, push, pop) Arithmetic and logic

(logic ops, tests CCs, shifts, integer and decimal arithmetic)

Control flow(branches, jumps, calls, returns)

String instructions(move and compare)

FP data movement(load, load const., store)

Arithmetic instructions

(add, subtract, multiply, divide, square root, absolute value) Comparisons

(Result to Flag)

Transcendental functions(sin, cos, log, etc.)


18/43


80x86 Instruction Format

Instructions sizes vary from 1 to 17 bytes


19/43


Instruction Set 8088 / 8086 CPU

FORMATS

1. One Byte The instructions have implied data or registeroperands.The least significant three bits specify register if any

2. Register to Register

Two Byte instruction where first byte contains Opcode followed by widthand second operand has 2nd register and R/ M fields. Mod field is 11

3. Register to / from Memory without displacement

NOTE: W fields D1 gives Dir i.e. 0 Byte2 Reg is Source, 1 Byte 2 Reg is DestinationW fields D0 bit specifies whether it is a eight bit data of 16 bit data

R/M field specifies one of 8 registers. The MOD field is 11 for Register, 00 for memorywithout displacement, 01 for memory with 8 bit displacement and 10 for 16 bit displacement


20/43



21/43


4. Register to / from Memory with DisplacementOne or Two additional bytes specify displacement

5. Immediate operand to RegisterIn this instruction the 7bits of first byte and bits 3-4 of secondbyte specify the op code. The last two bytes specify the data


22/43


6. Immediate Operand to Memory with 16 bit Displacement

First two bytes specify the Opcode MOD and R/M as before followed bytwo bytes of displacement and two bytes of data

Significance of OPCODE fields


23/43



24/43



25/43



26/43



27/43



28/43



29/43



30/43


Graphics and Multimedia Instruction Set Extensions

Several companies have extended their computers

instruction sets to support graphics and multimediaapplications. Intels MMX Technology Intels Internet Streaming SIMD Extensions AMDs 3DNow! Technology

Suns Visual Instruction Set Motorolas and IBMs AltiVec Technology

These extensions improve the performance of Computer-aided design

Internet applications Computer visualization Video games Speech recognition


31/43


MMX Instructions

MMX Technology adds 57 new instructions tothe x86 architecture (Reference article on PII MMX)

Some of these instructions include PADD(b, w, d) Packed addition PSUB(b, w, d) Packed subtraction PCMPE(b, w, d) Packed compare equal PMULLw Packed word multiply low PMULHw Packed word multiply high PMADDwd Packed word multiply-add PSRL(w, d, q) Pack shift right logical PACKSS(wb, dw) Pack data PUNPCK(bw, wd, dq) Unpack data PAND, POR, PXOR Packed logical operations


32/43


MMX Data Types

MMX Technology supports operations on thefollowing 64-bit integer data types.

Packed byte (eight 8-bit elements)

Packed word (four 16-bit elements)

Packed double word (two 32-bit elements)

Packed quad word (one 64-bit elements)


33/43


SIMD Operations

A3 A2 A1 A0

B3 B2 B1 B0

A3+B3 A2+B2 A1+B1 A0+B0

MMX Technology allows a Single Instruction to work on Multiplepieces of Data (SIMD).

Example: PADD[W]: Packed add word

4 parallel adds are performed on 16-bit elements. Most MMX instructions only require a single cycle.


34/43


Saturating Arithmetic

Both wrap-around and saturating ADD instructions are supported.

With saturating arithmetic, results that overflow are set to thelargest value.

Below are examples for both types

PADD[W]: Packed wrap-around add PADDUS[W]: Packed saturating add


35/43


Pack and Unpack Instructions

Pack and unpack instructions provideconversion between standard data types andpacked data types

PACKSS[DW]: Packed Signed with Saturating Double to Packed Word


36/43


Multiply-Add Operations

Many graphics applications require multiply-accumulate operations

Vector Dot Products

Matrix Multiplication

Fast Fourier Transforms (FFTs)

Filter implementations

PMADDWD: Packed multiply-add word to double


37/43


Vector Dot Product of two 8 Byte vectors

A dot product on two 8-element vector can be performed

using 9 MMX instructions

a0*c0+..+ a3*c3 a4*c4+..+ a7*c70 0

a0*c0+..+ a7*c7

With MMX 9 Instructions2 loads for one of the vectors

Other vector is loaded by PMADD2 PMADDs,2 PADDs,2 shifts (if reqd. to fix precision)1 Store


38/43


Vector Dot Product of two 8 Byte vectors

Without MMX 40 instructions

16 Load8 Multiply8 Shift7 Add1 Store

a0*c0+..+ a3*c3 a4*c4+..+ a7*c70 0

a0*c0+..+ a7*c7


39/43


MMX Technology Summary

MMX technology extends the Intel x86 architecture toimprove the performance of multimedia and graphicsapplications.

Most MMX instructions can be executed in one clock cycle,so the performance improvement will be more dramatic than

the simple ratio of instruction counts. It provides a speedup of 1.5 to 2.0 for certain applications.

Only increase the chip area by about 5%.


40/43


MMX Technology Summary

MMX instructions are hand-coded in assembly orimplemented as libraries to achieve high

performance.

MMX data types use the x86 floating point registers

Makes it easy to handle context switches

Makes it hard to perform MMX and floating pointinstructions at the same time


41/43


Internet Streaming SIMD Extensions

ISSE introduced eight 128-bit data registers(called XMM registers)

In 64-bit modes, they are available as 16 X 64-bit registers

The 128-bit packed single-precision floating-point data type,allows four single-precision operations to be performed

simultaneously


42/43

4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed42

ISSE Data Type

ISSE extensions introduced one new data type

128-Bit Packed Single-Precision Floating-Point Data Type

SSE 2 introduced five data types


43/43

43

Internet Streaming SIMD Extensions

Intels Internet Streaming SIMD Extensions (ISSE) Help improve the performance of video and 3D applications

70 new instructions beyond MMX Technology

Adds new 128-bit registers

Provide the ability to perform parallel floating point operations

Four parallel operations on 32-bit numbers

Reciprocal and reciprocal root instructions - normalization

Packed average instruction Motion compensation

Provide data pre-fetch instructions

Make certain applications 1.5 to 2.0 times faster.

4-inst set architecture

Documents