4-inst set architecture
TRANSCRIPT
-
8/2/2019 4-Inst Set Architecture
1/43
-
8/2/2019 4-Inst Set Architecture
2/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 2
Instruction Set Architecture
Instruction set architecture is based on thestructure of a computer i.e. the description ofthe CPU in terms of Registers, Addressabilityand various Arithmetic / Control and Storeoperations etc.
Assembly / Machine language programmermust understand ISA of target processor toprogram for it.
-
8/2/2019 4-Inst Set Architecture
3/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 3
Instruction Set Architecture
instruction set
High level language code : C, C++, Java,Fortan,
hardware
Assembly language code: architecture specific statements
Machine language code: architecture specific bit patterns
software
compiler
assembler
The programs written in any higher level languageeventually get converted to assembly level containinginstructions in mnemonics of instruction set.
The Assembler converts these into machine languagebefore execution
-
8/2/2019 4-Inst Set Architecture
4/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 4
ISA Metrics
Orthogonally All operand modes are available with any data type or
instruction type.
Completeness
Support for a wide range of operations and target
applications
Regularity
No overloading for the meanings of instruction fields
Streamlined
Resource needs easily determined
Ease of assembly language programming
Ease of implementation
-
8/2/2019 4-Inst Set Architecture
5/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 5
Instruction Set Design Issues
Instruction set design issues include: Where are operands stored?
registers, memory, stack, accumulator
How many explicit operands are there?
0, 1, 2, or 3 How is the operand location specified?
register, immediate, indirect, . . .
What type & size of operands are supported?
byte, int, float, double, string, vector. . . What operations are supported?
add, sub, mul, move, compare . . .
-
8/2/2019 4-Inst Set Architecture
6/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 6
Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
Accumulator + Index Registers
(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Modelfrom Implementation
High-level Language Based Concept of a Family
(B5000 1963) (IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets Load/Store Architecture
RISC
(Vax, Intel 8086 1977-80) (CDC 6600, Cray 1 1963-76)
(Mips,Sparc,88000,IBM RS6000, . . .1987+)
-
8/2/2019 4-Inst Set Architecture
7/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 7
Classifying ISAs
Accumulator (before 1960):1 address add A acc acc + mem[A]
Stack (1960s to 1970s):0 address add tos tos + next
Memory-Memory (1970s to 1980s):2 address add A, B mem[A] mem[A] + mem[B]3 address add A, B, C mem[A] mem[B] + mem[C]
Register-Memory (1970s to present):2 address add R1, A R1 R1 + mem[A]
load R1, A R1 mem[A]
Register-Register (Load/Store) (1960s to present):3 address add R1, R2, R3 R1 R2 + R3
load R1, R2 R1 mem[R2]store R1, R2 mem[R1] R2
-
8/2/2019 4-Inst Set Architecture
8/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 8
Types of Addressing Modes (VAX)
Addressing Mode Example Action
1. Register direct Add R4, R3 R4
-
8/2/2019 4-Inst Set Architecture
9/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 9
Types of Addressing Modes Intel Instruction Set
Register Instructions involving data manipulation through registers
Immediate Involves immediate values contained within the instructionDirect Transfer data to/from memory location to memory/ registerRegister Indirect Transfer a data byte/word to a location whose address is
specified in a register e.g. [Bx]Use of Byte PTR, Word PTR, DWord PTR specifies boundary
of data.Base + IndexIndirect
MOV AX, [BX+SI]Relative MOV AX, (BX+4)Base relative
plus index MOV AX, (BX+SI+4)Scaled Index MOV AX,[AX+4*BX]
-
8/2/2019 4-Inst Set Architecture
10/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 10
Instruction Encoding
Variable Size
Instruction length varies based on opcode and addressspecifiers
For example, VAX instructions vary between 1 and 53 bytes,while x86 instruction vary between 1 and 17 bytes.
Good source code density, but difficult to decode and pipeline
Fixed Size
Only a single size for all instructions
For example, DLX, MIPS, Power PC, Sparc all have 32 bitinstructions
Not as good code density, but easier to decode and pipeline
Hybrid Size
Have multiple format lengths specified by the opcode
For example, IBM 360/370
Compromise between code density and ease in decoding
-
8/2/2019 4-Inst Set Architecture
11/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 11
DLX Architecture
Introduced by Hennessey and Patterson in 1990 Derived from many different instruction set architectures
from MIPS, Sun, IBM, Intel, HP, AMD, etc.
DLX is a typical RISC architecture.
32-bit fixed length instructions 3 instruction formats
Load/store architecture
Simple branch conditions (no condition codes)
DLX registers 32 32-bit general-purpose registers (R0 = 0)
32 32-bit (or 16 64-bit) floating point registers
Special purpose registers (e.g., FP Status and PC)
-
8/2/2019 4-Inst Set Architecture
12/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 12
DLX Design Decisions
DLX is based on the following design decisions Use general purpose registers with a load-store architecture
Support commonly used addressing modes
displacement, immediate, and register deferred
Support simple instructions that occur frequently load, store, add, subtract, move, and, shift, compare
equal, branch, jump, call, and return
Support commonly required data sizes
8 (byte), 16 (half word), and 32-bit (word) integers
32 (float) and 64-bit (double) floating point Use fixed length instructions that are easy to decode
Provide plenty of general purpose registers and separatefloating point registers
-
8/2/2019 4-Inst Set Architecture
13/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 13
DLX Instruction Formats
Op
31 26 01516202125
rs1 rd immediate
Op
31 26 025
Op
31 26 01516202125
rs1 rs2
offset added to PC
rd
(a) Register-Register (R-type) ADD R1, R2, R3
561011
(b) Register-Immediate (I-type) SUB R1, R2, #3
(c) Jump / Call (J-type) JUMP end
function
(ALU immediate operations, loads and stores, conditional branch, jump )
(jump, jump and link, trap and return from exception)
(ALL reg. operations, read/write special registers and moves)
-
8/2/2019 4-Inst Set Architecture
14/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 14
Intel 80x86 Integer Registers
-
8/2/2019 4-Inst Set Architecture
15/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 15
X86 Operand Types
x86 instructions typically have two operands,where one operand is both a source and adestination operand.
Possible combinations include
Source/destination type Second source typeRegister Register
Register Immediate
Register Memory
Memory RegisterMemory Immediate
No memory-memory or immediate-immediate
Immediate can be 8, 16, or 32 bits
-
8/2/2019 4-Inst Set Architecture
16/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 16
Intel 80x86 Floating Point Registers
-
8/2/2019 4-Inst Set Architecture
17/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 17
80x86 Instructions
Data movement
(move, push, pop) Arithmetic and logic
(logic ops, tests CCs, shifts, integer and decimal arithmetic)
Control flow(branches, jumps, calls, returns)
String instructions(move and compare)
FP data movement(load, load const., store)
Arithmetic instructions
(add, subtract, multiply, divide, square root, absolute value) Comparisons
(Result to Flag)
Transcendental functions(sin, cos, log, etc.)
-
8/2/2019 4-Inst Set Architecture
18/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 18
80x86 Instruction Format
Instructions sizes vary from 1 to 17 bytes
-
8/2/2019 4-Inst Set Architecture
19/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 19
Instruction Set 8088 / 8086 CPU
FORMATS
1. One Byte The instructions have implied data or registeroperands.The least significant three bits specify register if any
2. Register to Register
Two Byte instruction where first byte contains Opcode followed by widthand second operand has 2nd register and R/ M fields. Mod field is 11
3. Register to / from Memory without displacement
NOTE: W fields D1 gives Dir i.e. 0 Byte2 Reg is Source, 1 Byte 2 Reg is DestinationW fields D0 bit specifies whether it is a eight bit data of 16 bit data
R/M field specifies one of 8 registers. The MOD field is 11 for Register, 00 for memorywithout displacement, 01 for memory with 8 bit displacement and 10 for 16 bit displacement
-
8/2/2019 4-Inst Set Architecture
20/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 20
-
8/2/2019 4-Inst Set Architecture
21/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 21
4. Register to / from Memory with DisplacementOne or Two additional bytes specify displacement
5. Immediate operand to RegisterIn this instruction the 7bits of first byte and bits 3-4 of secondbyte specify the op code. The last two bytes specify the data
-
8/2/2019 4-Inst Set Architecture
22/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 22
6. Immediate Operand to Memory with 16 bit Displacement
First two bytes specify the Opcode MOD and R/M as before followed bytwo bytes of displacement and two bytes of data
Significance of OPCODE fields
-
8/2/2019 4-Inst Set Architecture
23/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 23
-
8/2/2019 4-Inst Set Architecture
24/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 24
-
8/2/2019 4-Inst Set Architecture
25/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 25
-
8/2/2019 4-Inst Set Architecture
26/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 26
-
8/2/2019 4-Inst Set Architecture
27/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 27
-
8/2/2019 4-Inst Set Architecture
28/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 28
-
8/2/2019 4-Inst Set Architecture
29/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 29
-
8/2/2019 4-Inst Set Architecture
30/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 30
Graphics and Multimedia Instruction Set Extensions
Several companies have extended their computers
instruction sets to support graphics and multimediaapplications. Intels MMX Technology Intels Internet Streaming SIMD Extensions AMDs 3DNow! Technology
Suns Visual Instruction Set Motorolas and IBMs AltiVec Technology
These extensions improve the performance of Computer-aided design
Internet applications Computer visualization Video games Speech recognition
-
8/2/2019 4-Inst Set Architecture
31/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 31
MMX Instructions
MMX Technology adds 57 new instructions tothe x86 architecture (Reference article on PII MMX)
Some of these instructions include PADD(b, w, d) Packed addition PSUB(b, w, d) Packed subtraction PCMPE(b, w, d) Packed compare equal PMULLw Packed word multiply low PMULHw Packed word multiply high PMADDwd Packed word multiply-add PSRL(w, d, q) Pack shift right logical PACKSS(wb, dw) Pack data PUNPCK(bw, wd, dq) Unpack data PAND, POR, PXOR Packed logical operations
-
8/2/2019 4-Inst Set Architecture
32/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 32
MMX Data Types
MMX Technology supports operations on thefollowing 64-bit integer data types.
Packed byte (eight 8-bit elements)
Packed word (four 16-bit elements)
Packed double word (two 32-bit elements)
Packed quad word (one 64-bit elements)
-
8/2/2019 4-Inst Set Architecture
33/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 33
SIMD Operations
A3 A2 A1 A0
B3 B2 B1 B0
A3+B3 A2+B2 A1+B1 A0+B0
MMX Technology allows a Single Instruction to work on Multiplepieces of Data (SIMD).
Example: PADD[W]: Packed add word
4 parallel adds are performed on 16-bit elements. Most MMX instructions only require a single cycle.
-
8/2/2019 4-Inst Set Architecture
34/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 34
Saturating Arithmetic
Both wrap-around and saturating ADD instructions are supported.
With saturating arithmetic, results that overflow are set to thelargest value.
Below are examples for both types
PADD[W]: Packed wrap-around add PADDUS[W]: Packed saturating add
-
8/2/2019 4-Inst Set Architecture
35/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 35
Pack and Unpack Instructions
Pack and unpack instructions provideconversion between standard data types andpacked data types
PACKSS[DW]: Packed Signed with Saturating Double to Packed Word
-
8/2/2019 4-Inst Set Architecture
36/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 36
Multiply-Add Operations
Many graphics applications require multiply-accumulate operations
Vector Dot Products
Matrix Multiplication
Fast Fourier Transforms (FFTs)
Filter implementations
PMADDWD: Packed multiply-add word to double
-
8/2/2019 4-Inst Set Architecture
37/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 37
Vector Dot Product of two 8 Byte vectors
A dot product on two 8-element vector can be performed
using 9 MMX instructions
a0*c0+..+ a3*c3 a4*c4+..+ a7*c70 0
a0*c0+..+ a7*c7
With MMX 9 Instructions2 loads for one of the vectors
Other vector is loaded by PMADD2 PMADDs,2 PADDs,2 shifts (if reqd. to fix precision)1 Store
-
8/2/2019 4-Inst Set Architecture
38/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 38
Vector Dot Product of two 8 Byte vectors
Without MMX 40 instructions
16 Load8 Multiply8 Shift7 Add1 Store
a0*c0+..+ a3*c3 a4*c4+..+ a7*c70 0
a0*c0+..+ a7*c7
-
8/2/2019 4-Inst Set Architecture
39/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 39
MMX Technology Summary
MMX technology extends the Intel x86 architecture toimprove the performance of multimedia and graphicsapplications.
Most MMX instructions can be executed in one clock cycle,so the performance improvement will be more dramatic than
the simple ratio of instruction counts. It provides a speedup of 1.5 to 2.0 for certain applications.
Only increase the chip area by about 5%.
-
8/2/2019 4-Inst Set Architecture
40/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 40
MMX Technology Summary
MMX instructions are hand-coded in assembly orimplemented as libraries to achieve high
performance.
MMX data types use the x86 floating point registers
Makes it easy to handle context switches
Makes it hard to perform MMX and floating pointinstructions at the same time
-
8/2/2019 4-Inst Set Architecture
41/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 41
Internet Streaming SIMD Extensions
ISSE introduced eight 128-bit data registers(called XMM registers)
In 64-bit modes, they are available as 16 X 64-bit registers
The 128-bit packed single-precision floating-point data type,allows four single-precision operations to be performed
simultaneously
-
8/2/2019 4-Inst Set Architecture
42/43
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed42
ISSE Data Type
ISSE extensions introduced one new data type
128-Bit Packed Single-Precision Floating-Point Data Type
SSE 2 introduced five data types
-
8/2/2019 4-Inst Set Architecture
43/43
43
Internet Streaming SIMD Extensions
Intels Internet Streaming SIMD Extensions (ISSE) Help improve the performance of video and 3D applications
70 new instructions beyond MMX Technology
Adds new 128-bit registers
Provide the ability to perform parallel floating point operations
Four parallel operations on 32-bit numbers
Reciprocal and reciprocal root instructions - normalization
Packed average instruction Motion compensation
Provide data pre-fetch instructions
Make certain applications 1.5 to 2.0 times faster.