advanced computer architecture
TRANSCRIPT
CSE 8383 - Advanced Computer Architecture
Week-3Week of Jan 26, 2004
engr.smu.edu/~rewini/8383
Contents Linear Pipelines Nonlinear pipelines Instruction Pipelines Arithmetic Operations Design of Multifunction Pipeline
Linear Pipeline Processing Stages are linearly
connected Perform fixed function Synchronous Pipeline
Clocked latches between Stage i and Stage i+1
Equal delays in all stages Asynchronous Pipeline
(Handshaking)
Latches
S1 S2 S3
L1 L2
Equal delays clock period
Slowest stage determines delay
Reservation Table
X
X
X
X
S1
S2
S3
S4
Time
5 tasks on 4 stages
XX XX XX XX XX
XX XX XX XX XX
XX XX XX XX XX
XX XX XX XX XX
S1
S2
S3
S4
Time
Non Linear Pipelines Variable functions Feed-Forward Feedback
3 stages & 2 functions
S1 S2 S3
YX
Reservation Tables for X & Y
X X X
X X
X X X
Y Y
Y
Y Y Y
S1
S2
S3
S1
S2
S3
Linear Instruction Pipelines Assume the following instruction
execution phases: Fetch (F) Decode (D) Operand Fetch (O) Execute (E) Write results (W)
Pipeline Instruction Execution
II11 II22 II33
II11 II22 II33
II11 II22 II33
II11 II22 II33
II11 II22 II33
F
D
E
W
O
Dependencies Data Dependency
(Operand is not ready yet)
Instruction Dependency(Branching)
Will that Cause a Problem?
Data Dependency
I1 -- Add R1, R2, R3
I2 -- Sub R4, R1, R5
II11 II22
II11 II22
II11 II22
II11 II22
II11 II22
F
D
E
W
O
1 2 3 4 5 6
Solutions STALL Forwarding Write and Read in one cycle ….
Instruction Dependency
I1 – Branch o
I2 –
II11 II22
II11 II22
II11 II22
II11 II22
II11 II22
F
D
E
W
O
1 2 3 4 5 6
Solutions STALL Predict Branch taken Predict Branch not taken ….
Floating Point Multiplication Inputs (Mantissa1, Exponenet1),
(Mantissa2, Exponent2) Add the two exponents Exponent-out Multiple the 2 mantissas Normalize mantissa and adjust exponent Round the product mantissa to a single
length mantissa. You may adjust the exponent
Linear Pipeline for floating-point multiplication
Add Exponents
Multiply Mantissa
Normalize Round
Partial Products
AccumulatorAdd Exponents
Normalize Round
Renormalize
Linear Pipeline for floating-point Addition
Partial Shift
AddMantissa
Subtract Exponents
Find Leading 1
RoundRe
normalize
Partial Shift
Combined Adder and Multiplier
Partial Shift
AddMantissa
ExponentsSubtract
/ ADD
Find Leading 1
RoundRe
normalize
Partial Shift
Partial Products
CA
B
E D
F G H
Reservation Table for Multiply
1 2 3 4 5 6 7
A XB X XC X XD X XE XF
G
H
Reservation Table for Addition
1 2 3 4 5 6 7 8 9
A Y
B
C Y
D Y
E Y
F Y Y
G Y
H Y Y
Nonlinear Pipeline Design Latency
The number of clock cycles between two initiations of a pipeline
CollisionResource Conflict
Forbidden LatenciesLatencies that cause collisions
Nonlinear Pipeline Design cont Latency Sequence
A sequence of permissible latencies between successive task initiations
Latency CycleA sequence that repeats the same subsequence
Collision vectorC = (Cm, Cm-1, …, C2, C1), m <= n-1
n = number of column in reservation tableCi = 1 if latency i causes collision, 0 otherwise
Mul – Mul Collision (lunch after 1 cycle)
1 2 3 4 5 6 7
A X ZB X X Z ZC X X Z ZD X Z XE X ZF
G
H
Mul –Mul Collision (lunch after 2 cycles)
1 2 3 4 5 6 7
A X ZB X X Z ZC X X Z ZD X X ZE XF
G
H
Mul – Mul Collision (lunch after 3 cycles)
1 2 3 4 5 6 7
A X ZB X X Z ZC X X Z ZD X XE XF
G
H
Collision Vector for Multiply after Multiply
Forbidden Latencies: 1, 2
Collision vector0 0 0 0 1 1 11
Maximum forbidden latency = 2 m = 2
Example
S1 S2 S3
YX
Reservation Tables for X & Y
X X X
X X
X X X
Y Y
Y
Y Y Y
S1
S2
S3
S1
S2
S3
Reservation Tables for X & Y
X X X
X X
X X X
Y Y
Y
Y Y Y
S1
S2
S3
S1
S2
S3
Forbidden Latencies X after X X after Y Y after X Y after Y
X after X
X1 X2 X1 X2 X1
X1 X2 X1 X2
X1 X2 X1
X2 X1
S1
S2
S3
X1 X2 X1 X1
X1 X1 X2
X1 X1 X1 X2
S1
S2
S3
5
2
X after X
X1 X2 X1 X1
X1 X1 X2 X2
X1 X1 X2 X1
S1
S2
S3
X1 X1 X2 X1
X1 X1
X1 X1 X1
S1
S2
S3
4
7
Collision Vector Forbidden Latencies: 2, 4, 5, 7 Collision Vector = 1 0 1 1 0 1 0
Y after Y
Y Y Y
Y Y
Y Y Y
Y Y
S1
S2
S3
Y Y Y
Y
Y Y Y Y
S1
S2
S3
Collision Vector Forbidden Latencies: 2, 4 Collision Vector = 1 0 1 0
Exercise – Find the collision vector
1 2 3 4 5 6 7
A X X X
B X X
C X X
D X
State Diagram for X
1 0 1 1 0 1 0
1 1 1 1 1 1 11 0 1 1 0 1 1
36 8+
6
8+
8+
3*
1*
Cycles Simple cycles each state
appears only once(3), (6), (8), (1, 8), (3, 8), and (6,8) Greedy Cycles simple cycles
whose edges are all made with minimum latencies from their respective starting states
(1,8), (3) one of them is MAL