chapter3. processor design. cpu function : to execute instructions stored in a memory. –...

Chapter3. Processor Design

CPU function : to execute instructions stored in a memory.

– instruction cycle fetch cycle : fetch an instruction from main memory

execute cycle : decode the instruction, fetch any required operands, and perform the operation.

The behavior of CPU : sequence of register transfer operations.

CPU time( tCPU ) : the time required for the shortest CPU microoperation.

Interrupt : I/O devices request service from CPU

CPU Design issues

1. CPU should be as fast as the available technology permits. # of components in the CPU must be kept small. 2. Because of the size of the main memory, it must be constructed using less expensive and therefore slower technology than that of CPU.(1 to 10 ratio)

Von Noumann CPU design : the basic of almost all CPU design

CPU operation form

X1 := fi ( X1, X2 )

X1 and X2 denote CPU register( AC, DR or PC ) or an external memory location M(adr)

fi : fixed-point addition/subtraction shifting and logical operation

fetch operation IR.AR = M(PC) Two essential memory-addressing instruction load AC := M(adr) store M(adr) := AC

opcode memory address

I(instruction) = op.adr

Architecture Extensions

1. Additional addressable registers can be provided for storing operands and addresses. (index registers or base registers)

replacing the single accumulator by a set of register2. The capabilities of ALU can be extended from fixed-point addition/

subtraction to fixed-point multiplication3. Special registers can be included to facilitate the transfer of control

between instruction within a program (such as a flag register) 4. The transfer of control between different subroutines due to interrupts

or subroutine calls and returns is facilitated by special registers (such as PSW -Program status word- of IBM 360). Control is transferred by saving the current PSW in main memory and loading a new PSW into CPU. Control is returned to the first program by retrieving the previously saved PSW from memory and restoring to CPU. Most computers now use LIFO (last-in first-out) stack.

5. Facilitate simultaneous processing of two or more distinct instructions by extending memory addressing circuits and adding sufficient buffer storage to CPU. ALU can be divided into K parts to execute K instructions at once : Pipelining

A coprocessor is a specialized instruction execution unit that can be coupled a microprocessor so that instructions to be executed by P can be included inprograms fetched by the microprocessor. For example, the floating-point instructions of Motorola 68020 can be executed bymeans of an auxiliary 68881 floating-point coprocessor.A set of coprocessor instructions are defined for the 68020 when 68020 fetchesand decodes such instructions, it transfer the command position to the coprocessor,which then execute it.

3.2 Information representation

A word : an information unit of fixed length

ASCII code( 8bits ) American Standards Committee on Information Interchange

information

instruction

datanumerical data

non-numerical data

fixed-point number

floating-point number

o fixed-point numbers

b0b1…bn-1 bi = {0,1}

o floating-point number M 2E

mantissa

exponent

To assign representation that identify the major information types

Tag is used.

Word format of Burroughs B 6500/7500

Advantage : Instruction sets can be simplified and

software errors can be detected

Disadvantage : waste of memory

515049484710

information tag Parity bit

Error detection and correction

parity bit : a single check bit

– The parity bit is appended to an n-bit word X = (x0, x1, ···, xn-1) to form

( n+1) bit word X* = (x0, x1, ···, xn-1, c0)

even-parity : c0= x0 x1 ··· xn-1

odd-parity : c0= 1 x0 x1 ··· xn-1

If c0 = c0* ( the recomputed parity bit based on the received word ), then there is

no single-bit error but maybe multiple even # of bits error.

Single-bit error correction for n-bit word.

c: #of check bits required for single error correction. n+c : all possible single error locations

2c ≥ n + c + 1 for n = 4 c ≥ 3 for n = 8 c ≥ 4 for n = 16 c ≥ 5

These codes also have the ability to detect double errors SECDED ( Single Error Correction / Double Error Detection )

2c > n + c 2c ≥ n + c + 1

Error-free- case

Example) 16 bit word X = ( x0, x1, ··· , x15 ) → 5 check bit ( c0, c1, c2, c3, c4 )

(x0, x1, ··· , x15, c0, c1, c2, c3, c4) (x0r, x1

r, ··· , x15r, c0

r, c1r, c2

r, c3r, c4

r )

calculate a new set of check bits(c0*, c1

*, c2*, c3

*, c4*) from (x0

r, x1r, ··· , x15

r)

The error vector E=(c0r c0

*, c1r c1

*, c2r c2

*, c3r c3

*, c4r c4

* )

If E = (0, 0, 0, 0, 0), then no detectable error has occurred.

If E=(0, 0, 0, 1, 1), then a single fault in a bit common only to c3 & c4 is detected.

The error caused to x0 to become x0

The error is corrected by changing x0r to x0

r

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

c0 = ⊕x2 ⊕x5 ⊕x10 ⊕x11 ⊕x12 ⊕x13 ⊕x14 ⊕x15

c1 = ⊕x4 ⊕x5 ⊕x6 ⊕x7 ⊕x8 ⊕x9 ⊕x10 ⊕x15

c2 = ⊕x1 ⊕x2 ⊕x3 ⊕x7 ⊕x8 ⊕x9 ⊕x14 ⊕x15

c3 = ⊕x0 ⊕x2 ⊕x3 ⊕x5 ⊕x6 ⊕x9 ⊕x12 ⊕x13

c4 = ⊕x0 ⊕x1 ⊕x3 ⊕x4 ⊕x6 ⊕x8 ⊕x11 ⊕x13

Number Format

1. The types of numbers to be represented : integer, real number. 2. The range values 3. The precision of values 4. Hardware complexity Binary number - sign-magnitude

+ 5 : 0101 – 5 : 1101 One’s complement representation - positive number : same as sign-magnitude - negative number : bitwise logical complement + 5 : 0101 + 0 : 0 · · · 0 – 5 : 1010 – 0 : 1 · · · 1

x0, x1, · · ·, xn-1

magnitudesign

Two representations of 0

Two’s complement representation

- positive number : same as sign-magnitude - negative number : do the bit-wise complement, then

add 1 to the least significant bit, and ignore carry generated from the most significant bit

– 5 : 0 1 0 1 1 0 1 0

1 0 1 1

- unique representation of 0

IEEE 754 standard 32 bit floating point number format

M : a sign-magnitude binary numberThe magnitude part of a normalized sign-magnitude number has 1 as its mostsignificant digit. No need to store this 1.The complete mantissa, called significand, is actually 1.M The precision is effectively increased by 1 bit.The actual exponent value is computed as E-127.1 bit left(right) shift of Mcorresponds to incrementing( decrementing ) E by 1.

31· · ·98· · ·10

Sign bit mantissa M( 23 bit ) exponent E ( excess 127 binary integer )

N = (–1)S 2E–127 ( 1.M ) for 0 < E < 255

N = 1 0 1 1 1 1 1 1 1 0 0 · · · 0

= – 2127–127 ( 1.5 )

= – 1.5

E M

N = 0 0 1 1 1 1 1 1 0 0 0

= ( – 1 )0 2127 –127 (1.M )

= 1

1.75 0 0 1 1 1 1 1 1 1 1 1 0 0 · · · 0

E M

Magnitude range : 1 2ⅹ –126 ~ ( 2 – 2–23 ) 2ⅹ 127

32-bit fixed-point number range : 2–32 ~ 231 – 1

If the result of a floating-point operation is not a valid floating-point number

then a special code referred to as not-a-number( NaN ) is used.

If E = 255 and M ≠ 0, then N = NaN

If E = 255 and M = 0, then N = ( – 1 )S

If 0 < E < 255, then N = ( – 1 )S 2 E – 127( 1.M )

If E = 0 and M ≠ 0, then underflow

If E = 0 and M = 0, then N = 0

Floating-point round-off error

: caused by the fact that every number must be represented by a limited number

of bits

N1 + N2 = M1 2e1 + M2 2e2

For example,

1.1 23 + 1.01 22 = 22( 11 + 1.01 )

= 22 100.01

= 1.001 24

1.10 · · · 01 23 + 1.010 · · · 01 22

= 23 ( 1.10 · · · 01 + 0.1010 · · · 01 )

M1 = 1 0 · · · 0 1

M2 = 0 1 · · · 0 1

M2 2–1 = 0 0 1 · · · 0 0 1

Shift out23

Example of matrix multiplication: accumulation of roundoff errors : caused by the fact that every number must be represented

by a limited number of bits A x B = C

a11 a12 … a1n

a21 a22 … a2n

.an1 … ann

b11 b12 … b1n

b21 b22 … b2n

.bn1 … bnn

c11 c12 … c1n

c21 c22 … c2n

.cn1 … cnn

=

3.2 Instruction sets

- to specify an operation to be carried out and the set of operands or data

to be used

Basic Instruction Format

opcode operands

N–10

X1 f ( x1, x2, · · · , xn )

Addressing modes : How to specify the current value of data X

- immediate addressing : when data X is constant, its value can be placed in the operand field - direct addressing : the corresponding operand field contains the address X of the storage location containing the required value - indirect addressing : the instruction contains the address W of a storage location which in turn contains the address X of the desired operand Intel 8085’s MVI A, 99 immediate addressing MOV A, B direct addressing absolute addressing : require the complete operand address to appear in the instruction operand field relative addressing : the operand fields contain a relative address, and the effective address of an operand is some function

The reasons for relative addressing

① Since all the address information need not be included in the instruction,

instruction length is reduced.

② By changing the contents of R, the processor can change the absolute

addresses referred to by a block of instructions B R : a Base register

③ R can be used for storing indices to facilitate the processing of indexed data.

R : an index register.

Disadvantage of relative addressing

: needs extra hardware to calculate the effective address and extra processing

time to calculate the effective address.

Number of addresses : The fewer the addresses, the shorter the instruction The fewer addresses mean more primitive instructions, longer program.

A 3-address machine ADD Z, X, Y : add the contents of memory locations X and Y and place its

result in Z

A 2-address machines ADD X, Y : AC X + Y or X X + Y

A 1-address machines ADD X : AC AC + X

A 0-address machines ADD : all operands are required to be in the top positions in the stack

Instruction types: what types of instructions should be included in a general purpose processor? Requirements of an instruction set ① should be complete to evaluate any computable function ② should be efficient in that frequently required functions can be preformed rapidly using relatively few instructions.

③ should be regular ④ should be compatible to reduce hardware & software design cost No standard machine Five main types of instructions ① Data-transfer instructions, which copy information from one location to another either in the processor’s internal register set or in the external main memory. ② Arithmetic instructions, which perform operations on numerical data. ③ Logical instructions, which include Boolean and other non-numerical operations. ④ Program-control instructions, such as branch instructions, which change the sequence in which programs are executed. ⑤ Input-output( IO ) instructions, which cause information to be transferred between the processor or its main memory and external IO devices.

RISC versus CISC

With cheaper hardware, instructions tend to increase both in number and

complexity.

Suppose that a particular complex operation F can be implemented either by

a single complex instruction IF or by a multi-instruction routine PF composed of

simple instructions.

Execution of PF will be slower than that of IF due to fetching time.

PF occupies more memory space than IF.

IF address to the complexity of control unit, thus increasing the size of the

processor and design time.

RISC versus CISC

Assembly language : simple by using IF

High-level language

: The improvement in the execution speed for IF may not be fully realizable.

A compiler will translate F into the corresponding instruction IF which uses

fixed CPU registers and has a fixed execution time. If IF is not available, an

efficient “optimizing” compiler may be able to generate object code OF

corresponding to PF that exploits information known at compilation time, to reduce the execution time for F.

The speed gap between IF and PF can be narrowed by designing the small

instruction set required for PF, to reduce the instruction fetch and execution

cycle times as far as possible. Another speed advantage of PF over IF is that

PF can be interrupted in mid-operation, whereas IF must proceed to termination before CPU can respond to an interrupt.

RISC versus CISC

The main features of RISC(Reduced Instruction Set Computer)

1. Relatively few instruction types and addressing modes.

2. Fixed and easily decoded instruction formats.

3. Fast single-cycle instruction execution. /* Main point */

4. Hardwired rather than microprogrammed control.

5. Memory access is limited mainly to load and store instruction.

Large # of registers in CPU. Most RISC instructions involve only

register-to- register operation internal to CPU

6. Use of compilers to optimize object code performance.

Key points : efficient compilation cooperation of the machine architects and compiler

In scientific computing application with lots of floating-point arithmetic, CISC is better.

RISC versus CISCRISC I microprocessor( by Patterson )

A single-chip 32 bit CPU, 32 bit 138 general purpose registers

- to achieve single cycle execution with instructions of fixed size (all

instructions are 32 bit long)

- to access main memory with load and store only.

- to provide some support for high level language.

Instruction format of RISC I

0 6 7 8 12 13 18 19 31

set condition code set immediate address

Opcode Source1 RS Destination RD Source2 S2

Relative address Y

Most instructions are register-to-register types

RD f ( RS , S2 ) : Rightmost 5 bits of S2 define a second source register

If bit 18 is set to 1, then S2 is interpreted as a 13-bit constant or immediate address. In this case, S2 is automatically expanded to 32 extension by sign extension. Memory is addressed by using RS as an index register and S2 as a 13-bit offset(effective address ; M(RS + S2))

Setting RS= 0 → direct addressingSetting S2 = 0 → indirect addressing

No explicit I/O instruction : memory-mapped I/OMultiple-word operation : through add/subtract-with-carry instruction 64-bit addition

C ← A + B ①Apply ADD instruction to the right halves of A and B. ②Apply ADDC instruction it the left halves of A and B

Logical instruction : AND, OR, XOR, SLL, SRL, SRA( arithmetic right-shift )Program control : allow hardware support for parameter passing in high-level languages.

RISC I allows the passing of parameters during subroutine calls and returns to

be done rapidly using its CPU registers (cf. Most computers employ a memory

stack: slow operation).

Each subroutine is assigned from the 138 CPU registers.

A virtual set of 32 registers ( R:0 ~ R:31) for storing its input and output

parameter(a register window): When subroutine A calls subroutine B, the register window assigned to B is overlapped with that of A. The output parameter part of A’s window and the input parameter part of B’s window are assigned to the same physical register : immediate access.

3.3.3 Programming Consideration

Assembly language programming

Assembler : translation of assembly language instruction into the equivalent

machine instructions.

Pseudo instruction : not part of object code

Macro instruction(macro) part of object code

Subroutine : part of object code

Tools to simplifythe program

chapter3. processor design. cpu function : to execute instructions stored in a memory. –...

Documents

behavior of cpu

cpu register ac

main memory

shortest cpu microoperation

essential memory

ratiovon noumann cpu

instruction cycle fetch

instruction sets