advanced computer architecture

40
CSE 8383 - Advanced Computer Architecture Week-3 Week of Jan 26, 2004 engr.smu.edu/~rewini/8383

Upload: md-mahedi-mahfuj

Post on 11-May-2015

4.708 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Advanced computer architecture

CSE 8383 - Advanced Computer Architecture

Week-3Week of Jan 26, 2004

engr.smu.edu/~rewini/8383

Page 2: Advanced computer architecture

Contents Linear Pipelines Nonlinear pipelines Instruction Pipelines Arithmetic Operations Design of Multifunction Pipeline

Page 3: Advanced computer architecture

Linear Pipeline Processing Stages are linearly

connected Perform fixed function Synchronous Pipeline

Clocked latches between Stage i and Stage i+1

Equal delays in all stages Asynchronous Pipeline

(Handshaking)

Page 4: Advanced computer architecture

Latches

S1 S2 S3

L1 L2

Equal delays clock period

Slowest stage determines delay

Page 5: Advanced computer architecture

Reservation Table

X

X

X

X

S1

S2

S3

S4

Time

Page 6: Advanced computer architecture

5 tasks on 4 stages

XX XX XX XX XX

XX XX XX XX XX

XX XX XX XX XX

XX XX XX XX XX

S1

S2

S3

S4

Time

Page 7: Advanced computer architecture

Non Linear Pipelines Variable functions Feed-Forward Feedback

Page 8: Advanced computer architecture

3 stages & 2 functions

S1 S2 S3

YX

Page 9: Advanced computer architecture

Reservation Tables for X & Y

X X X

X X

X X X

Y Y

Y

Y Y Y

S1

S2

S3

S1

S2

S3

Page 10: Advanced computer architecture

Linear Instruction Pipelines Assume the following instruction

execution phases: Fetch (F) Decode (D) Operand Fetch (O) Execute (E) Write results (W)

Page 11: Advanced computer architecture

Pipeline Instruction Execution

II11 II22 II33

II11 II22 II33

II11 II22 II33

II11 II22 II33

II11 II22 II33

F

D

E

W

O

Page 12: Advanced computer architecture

Dependencies Data Dependency

(Operand is not ready yet)

Instruction Dependency(Branching)

Will that Cause a Problem?

Page 13: Advanced computer architecture

Data Dependency

I1 -- Add R1, R2, R3

I2 -- Sub R4, R1, R5

II11 II22

II11 II22

II11 II22

II11 II22

II11 II22

F

D

E

W

O

1 2 3 4 5 6

Page 14: Advanced computer architecture

Solutions STALL Forwarding Write and Read in one cycle ….

Page 15: Advanced computer architecture

Instruction Dependency

I1 – Branch o

I2 –

II11 II22

II11 II22

II11 II22

II11 II22

II11 II22

F

D

E

W

O

1 2 3 4 5 6

Page 16: Advanced computer architecture

Solutions STALL Predict Branch taken Predict Branch not taken ….

Page 17: Advanced computer architecture

Floating Point Multiplication Inputs (Mantissa1, Exponenet1),

(Mantissa2, Exponent2) Add the two exponents Exponent-out Multiple the 2 mantissas Normalize mantissa and adjust exponent Round the product mantissa to a single

length mantissa. You may adjust the exponent

Page 18: Advanced computer architecture

Linear Pipeline for floating-point multiplication

Add Exponents

Multiply Mantissa

Normalize Round

Partial Products

AccumulatorAdd Exponents

Normalize Round

Renormalize

Page 19: Advanced computer architecture

Linear Pipeline for floating-point Addition

Partial Shift

AddMantissa

Subtract Exponents

Find Leading 1

RoundRe

normalize

Partial Shift

Page 20: Advanced computer architecture

Combined Adder and Multiplier

Partial Shift

AddMantissa

ExponentsSubtract

/ ADD

Find Leading 1

RoundRe

normalize

Partial Shift

Partial Products

CA

B

E D

F G H

Page 21: Advanced computer architecture

Reservation Table for Multiply

1 2 3 4 5 6 7

A XB X XC X XD X XE XF

G

H

Page 22: Advanced computer architecture

Reservation Table for Addition

1 2 3 4 5 6 7 8 9

A Y

B

C Y

D Y

E Y

F Y Y

G Y

H Y Y

Page 23: Advanced computer architecture

Nonlinear Pipeline Design Latency

The number of clock cycles between two initiations of a pipeline

CollisionResource Conflict

Forbidden LatenciesLatencies that cause collisions

Page 24: Advanced computer architecture

Nonlinear Pipeline Design cont Latency Sequence

A sequence of permissible latencies between successive task initiations

Latency CycleA sequence that repeats the same subsequence

Collision vectorC = (Cm, Cm-1, …, C2, C1), m <= n-1

n = number of column in reservation tableCi = 1 if latency i causes collision, 0 otherwise

Page 25: Advanced computer architecture

Mul – Mul Collision (lunch after 1 cycle)

1 2 3 4 5 6 7

A X ZB X X Z ZC X X Z ZD X Z XE X ZF

G

H

Page 26: Advanced computer architecture

Mul –Mul Collision (lunch after 2 cycles)

1 2 3 4 5 6 7

A X ZB X X Z ZC X X Z ZD X X ZE XF

G

H

Page 27: Advanced computer architecture

Mul – Mul Collision (lunch after 3 cycles)

1 2 3 4 5 6 7

A X ZB X X Z ZC X X Z ZD X XE XF

G

H

Page 28: Advanced computer architecture

Collision Vector for Multiply after Multiply

Forbidden Latencies: 1, 2

Collision vector0 0 0 0 1 1 11

Maximum forbidden latency = 2 m = 2

Page 29: Advanced computer architecture

Example

S1 S2 S3

YX

Page 30: Advanced computer architecture

Reservation Tables for X & Y

X X X

X X

X X X

Y Y

Y

Y Y Y

S1

S2

S3

S1

S2

S3

Page 31: Advanced computer architecture

Reservation Tables for X & Y

X X X

X X

X X X

Y Y

Y

Y Y Y

S1

S2

S3

S1

S2

S3

Page 32: Advanced computer architecture

Forbidden Latencies X after X X after Y Y after X Y after Y

Page 33: Advanced computer architecture

X after X

X1 X2 X1 X2 X1

X1 X2 X1 X2

X1 X2 X1

X2 X1

S1

S2

S3

X1 X2 X1 X1

X1 X1 X2

X1 X1 X1 X2

S1

S2

S3

5

2

Page 34: Advanced computer architecture

X after X

X1 X2 X1 X1

X1 X1 X2 X2

X1 X1 X2 X1

S1

S2

S3

X1 X1 X2 X1

X1 X1

X1 X1 X1

S1

S2

S3

4

7

Page 35: Advanced computer architecture

Collision Vector Forbidden Latencies: 2, 4, 5, 7 Collision Vector = 1 0 1 1 0 1 0

Page 36: Advanced computer architecture

Y after Y

Y Y Y

Y Y

Y Y Y

Y Y

S1

S2

S3

Y Y Y

Y

Y Y Y Y

S1

S2

S3

Page 37: Advanced computer architecture

Collision Vector Forbidden Latencies: 2, 4 Collision Vector = 1 0 1 0

Page 38: Advanced computer architecture

Exercise – Find the collision vector

1 2 3 4 5 6 7

A X X X

B X X

C X X

D X

Page 39: Advanced computer architecture

State Diagram for X

1 0 1 1 0 1 0

1 1 1 1 1 1 11 0 1 1 0 1 1

36 8+

6

8+

8+

3*

1*

Page 40: Advanced computer architecture

Cycles Simple cycles each state

appears only once(3), (6), (8), (1, 8), (3, 8), and (6,8) Greedy Cycles simple cycles

whose edges are all made with minimum latencies from their respective starting states

(1,8), (3) one of them is MAL