COSC 6385 – Computer ArchitectureEdgar Gabriel
COSC 6385 Computer Architecture
- Pipelining (II)
Edgar GabrielFall 2006
COSC 6385 – Computer ArchitectureEdgar Gabriel
Pipeline Hazards• Limits to pipelining: Hazards prevent next instruction
from executing during its designated clock cycle– Structural hazards: HW cannot support this combination of
instructions – Data hazards: Instruction depends on result of prior
instruction still in the pipeline – Control hazards: Caused by delay between the fetching of
instructions and decisions about changes in control flow (branches and jumps).
Slide based on a lecture by David Culler, University of California, Berkleyhttp://www.eecs.berkeley.edu/~culler/courses/cs252-s05
COSC 6385 – Computer ArchitectureEdgar Gabriel
• Read After Write (RAW): InstrJ tries to read operand before InstrI writes it
– Caused by a “Dependence” (in compiler nomenclature). This hazard results from an actual need for communication.
• Write After Read (WAR): InstrJ writes operand before InstrIreads it
– Called an “anti-dependence”• Write After Write (WAW): InstrJ writes operand before InstrI
writes it.– Called an “output dependence” by compiler writers
Three Generic Data Hazards
Slide based on a lecture by David Culler, University of California, Berkleyhttp://www.eecs.berkeley.edu/~culler/courses/cs252-s05
COSC 6385 – Computer ArchitectureEdgar Gabriel
Four Branch Hazard Alternatives
#1: Stall until branch direction is clear#2: Predict Branch Not Taken
– Execute successor instructions in sequence#3: Predict Branch Taken
– Haven’t calculated branch target address yet, still incurs 1 cycle branch penalty
#4: Delayed Branch– Define branch to take place AFTER a following instruction– 1 slot delay allows proper decision and branch target address in
5 stage pipeline
Slide based on a lecture by David Culler, University of California, Berkleyhttp://www.eecs.berkeley.edu/~culler/courses/cs252-s05
COSC 6385 – Computer ArchitectureEdgar Gabriel
Performance evaluation of pipelines (I)
enh
org
timeExecutiontimeExecution
Speedup__
=enhenhenh
orgorgorg
CPIeClockClyclICCPIClockCycleIC××
××=
If ICorg = ICenhenhenh
orgorg
CPIeClockClyclCPIClockCycle
Speedup×
×=
If ICorg = ICenh and ClockCycleorg = ClockCycleenhenh
org
CPICPI
Speedup =
COSC 6385 – Computer ArchitectureEdgar Gabriel
Performance evaluation of pipelines (II)
enh
orgoverall timeExecution
timeExecutionSpeedup
__
=
∑
∑
=
=
××
××= n
ienh
iienh
n
iorg
iiorg
CPIICeClockClycl
CPIICeClockClycl
1
1
with
If looking at individual instructions
total
ii
ICICf =
If ICtotal does not change, you can also use the average instruction execution time (AvIETime)
enh
orgoverall timeExecution
timeExecutionSpeedup
__
=
∑
∑
=
=
××
××= n
ienh
iienh
n
iorg
iiorg
CPIfeClockClycl
CPIfeClockClycl
1
1
COSC 6385 – Computer ArchitectureEdgar Gabriel
Performance evaluation of pipelines (III)
• Comparing pipelined and non-pipelined execution:
• Also
stagespipelinenumtimeExecution
timeExecution pipelinednonpipelined __
__ _=
stagespipelinenumtimeExecution
timeExecutionSpeedup
pipelined
pipelinednon ___
_ _ ==
pipelined
pipelinednon
pipelined
pipelinednon
pipelined
pipelinednon
ClockCycleClockCycle
CPICPI
AvIETimeAvIETime
Speedup ___ ×==
Ideal CPIpipelined = 1Realist CPIpipelined = Ideal CPIpipelined + Pipeline stall cycles per instruction
COSC 6385 – Computer ArchitectureEdgar Gabriel
Performance evaluation of pipelines (IV)
pipelined
pipelinednon
AvIETimeAvIETime
Speedup _=
Thus:
If ClockCycle does not change:
erInstrallCyclesPPipelineStCPI
Speedup pipelinednon
+=
1_
If all instructions take the same number of cycles (=Number of pipeline stages)
pipelined
pipelinednonpipelinednon
ClockCycleClockCycle
erInstrallCyclesPPipelineStCPI __
1×
+=
erInstrallCyclesPPipelineStstagespipelinenumSpeedup
+=
1__
COSC 6385 – Computer ArchitectureEdgar Gabriel
Example pg A-10• (A) Given an non-pipelined processor:
– 1 ns clock cycle time– 4 Cycles for ALU operations– 4 cycles for branches– 5 cycles for memory operations
• (B) Given also a pipelined processor– 1.2 ns clock cycle time
• Both (A) and (B) have– 40% ALU operations– 40% branches– 20% memory operations
• What is the speedup of (B) over (A) due to pipelining?
COSC 6385 – Computer ArchitectureEdgar Gabriel
Example pg A-10 (II)
For machine (A): ∑=
××=n
i
iiAA CPIfClockCycleAvIETime
1)(
nsns 4.4)54.042.044.0(1 =×+×+××=
For machine (B): assuming ideal CPI (= 1)
∑=
××=n
i
iiBB CPIfClockCycleAvIETime
1)(
nsns 2.1)14.012.014.0(2.1 =×+×+××=
7.32.14.4
)(
)( ===nsns
AvIETimeAvIETime
SpeedupB
AThus
COSC 6385 – Computer ArchitectureEdgar Gabriel
Example: Dual-port vs. Single-port• Machine A: Dual ported memory (“Harvard Architecture”)• Machine B: Single ported memory, but its pipelined
implementation has a 1.05 times faster clock rate• Ideal CPI = 1 for both• Loads are 40% of instructions executed
SpeedUpA = Pipeline Depth/(1 + 0) x (clockunpipe/clockpipe)= Pipeline Depth
SpeedUpB = Pipeline Depth/(1 + 0.4 x 1) x (clockunpipe/(clockunpipe / 1.05)= (Pipeline Depth/1.4) x 1.05= 0.75 x Pipeline Depth
SpeedUpA / SpeedUpB = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33
• Machine A is 1.33 times faster Slide based on a lecture by David Culler, University of California, Berkleyhttp://www.eecs.berkeley.edu/~culler/courses/cs252-s05
COSC 6385 – Computer ArchitectureEdgar Gabriel
Exceptions• Instruction execution order is interrupted • E.g.
– I/O device request– Invoking an OS service from an application– Tracing execution– Breakpoint– Integer of FP arithmetic anomaly (e.g. overflow)– Page fault– Misaligned memory access– Memory protection violation– Hardware malfunction
COSC 6385 – Computer ArchitectureEdgar Gabriel
Classification of Exceptions• Problems with pipelining:
– Different stages of the pipeline can raise exceptions leading to a different order of exceptions compared to the unpipelined case
• Classes of exceptions1. Synchronous vs. Asynchronous: 2. User requested vs. Coerced3. User maskable vs. user non-maskable4. Within vs. between instructions5. Resume vs. terminate
COSC 6385 – Computer ArchitectureEdgar Gabriel
Exceptions• Most problematic: exceptions raised within instructions,
where the instruction must be resumed– Another program must be invoked to save the state of the
program• Pipelines capable of handling exceptions are called
restartable
NonWB
Page fault on data fetch; misaligned memory access; memory protection violation
MEM
Arithmetic exceptionEX
Undefined or illegal opcodeID
Page fault on Instruction fetch; misaligned memory access; memory protection violation
IF
Possible exceptionsPipeline stage
COSC 6385 – Computer ArchitectureEdgar Gabriel
Exceptions• Since an exception can not be raised when it occurs
– Status vector associated with instruction shows exception– Status vector carried along with instruction– Writing of data values disabled if status vector is set– In WB status vector checked and exception handled
=> Exception of instruction i handled before exception of instruction i+1
=> Since no data values are written back, register file not changed -> instruction can be repeated
COSC 6385 – Computer ArchitectureEdgar Gabriel
Multi-cycle instructions• Floating point instructions can take many cycles to complete• Often implemented by multiple executions of the EX stage
– Not all instructions will take the same amount of cycles to finish!• Latency:
– number of intervening cycles between an instruction that produces a result and instruction that uses the result
– Usually: depth of the EX stage -1• Initiation interval:
– Number of cycles that must elapse between issuing two operations of a given type
• Multi-cycle instructions/pipelines increase the probability for occurring WAW and RAW hazards
COSC 6385 – Computer ArchitectureEdgar Gabriel
Example for a multi-cycle pipeline
IF ID
EXInteger unit
M1 M2 M3 M4 M5 M6 M7
FP/Integer multiply unit
A1 A2 A3 A4FP/Integer add unit
DIVFP/Integer division (non pipelined)
MEM WB
24
6
3
1
0
Latency
25FP divide
1FP multiply
1FP add
1Data memory
1Integer ALU
Initiation intervalFunctional unit
COSC 6385 – Computer ArchitectureEdgar Gabriel
Instruction level parallelism• Exploit parallelism between independent instructions
– Limited by data dependencies– Limited by branches
• Example:
– Each iteration of the loop is independent– Exploitation of that fact is not trivial because of register
reuse!
for (i=0; i<n; i++ ) {c[i] = a[i] + b[i];
}
COSC 6385 – Computer ArchitectureEdgar Gabriel
Instruction level parallelism• Data dependencies:
– True dependencies: instruction i produces a result required by instruction i+k, k>0 (RAW)
• sharing a register or a memory location– Name dependencies: usage of the same register or memory
location without data flow• Antidependence: instruction i+k writes a register/memory
location read by instruction i (WAR)– No problem if not reordering instructions
• Output dependence: instruction i and instruction i+k write the same register/memory location (WAW)
– No problem if not reordering instructions– Control dependencies: determines ordering of an instruction i
with respect to a branch
COSC 6385 – Computer ArchitectureEdgar Gabriel
Dynamic scheduling• Up-to-now
– Instructions are issued in program order– If an instruction is stalled in the pipeline, no later
instruction can proceedDIV.D F0, F2, F4ADD.D F10, F0, F8SUB.D F12, F8, F14
• In order to allow out-of-order execution, the ID stage is split into two parts:– Instruction issue: decode instruction and check for
structural hazards– Read operands: Read operands if no data hazard
COSC 6385 – Computer ArchitectureEdgar Gabriel
Dynamic scheduling• Out-of-order execution introduces the possibility of WAR and WAW
hazardsDIV.D F0, F2, F4 DIV.D F0, F2, F4ADD.D F10, F0, F8 SUB.D F8, F8, F14SUB.D F8, F8, F14 ADD.D F10, F0, F8
• Out-of-order execution only improves performance if– Multiple instructions can be executed at once– Multiple functional units are available
• All instructions pass through the issue stage in order• Instructions can be bypassed in the read-operand stage• Algorithms allowing instructions to execute out-of-order
– Scoreboarding– Tomasulo’s approach
COSC 6385 – Computer ArchitectureEdgar Gabriel
Scoreboarding• First implemented in the CDC6600• Assumption for the following slides:
– 2 multipliers– 1 adder– 1 divider– 1 integer unit
• Each instruction goes through the scoreboard– Scoreboard determines when an instruction can execute– Scoreboard monitors usage of execution units– Scoreboard monitors when a result can be written to the
destination register
COSC 6385 – Computer ArchitectureEdgar Gabriel
Scoreboarding (II)4 steps of Scoreboarding (replaces ID, EX and WB)1. Issue: if functional unit is free and no other active
instruction has the same destination register2. Read operands: Scoreboard monitors the availability of
operands. An operand is available if no earlier, active instruction is going to write it.
3. Execution4. Write result: if Execution done, Scoreboard checks for
WAR hazards and stalls the instruction of necessary.
COSC 6385 – Computer ArchitectureEdgar Gabriel
Scoreboarding (II)Scoreboard data structures:• Instruction status: which of the four steps the instruction is in• Functional unit status: status of a functional unit.
– Busy: indicates whether unit is busy or not– Op: operation to be performed– Fi: Destination register number– Fj, Fk: Source register number– Qj, Qk: Functional units producing source registers Fj, Fk– Rj, Rk: Flags indicating whether Fj, Fk are ready. Set to NO
after operands are read.• Register result status: which functional unit will write which register
COSC 6385 – Computer ArchitectureEdgar Gabriel
Scoreboarding example
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Following slides are based on a lecture by Jelena Mirkovic, University of Delawarehttp://www.cis.udel.edu/~sunshine/courses/F04/CIS662/class10.pdf
Assumption:ADD and SUB take 2 clock cyclesMULT takes 1 clock cycleDIV takes 40 clock cycles
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
Divide
Add
Mult2
Mult1
YesR2F6LoadYesInteger
RkRjQkQjFkFjFiOpBusyName
Functional unit status
IntegerFU
F30…F12F10F8F6F4F2F0
Register result status
Time=1 Issue first load
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
Divide
Add
Mult2
Mult1
NoR2F6LoadYesInteger
RkRjQkQjFkFjFiOpBusyName
Functional unit status
IntegerFU
F30…F12F10F8F6F4F2F0
Register result status
Time=2 first load read operands; second load can not issue (structural hazard)
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
Divide
Add
Mult2
Mult1
NoR2F6LoadYesInteger
RkRjQkQjFkFjFiOpBusyName
Functional unit status
IntegerFU
F30…F12F10F8F6F4F2F0
Register result status
Time=3 first load completes exec; second load can not issue (SH)
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
Divide
Add
Mult2
Mult1
Integer
RkRjQkQjFkFjFiOpBusyName
Functional unit status
FU
F30…F12F10F8F6F4F2F0
Register result status
Time=4 first load writes result; second load can not issue (SH)
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
Divide
Add
Mult2
Mult1
YesR3F2LoadYesInteger
RkRjQkQjFkFjFiOpBusyName
Functional unit status
IntegerFU
F30…F12F10F8F6F4F2F0
Register result status
Time=5 Second load is issued
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
Divide
Add
Mult2
YesNoIntegerF4F2F0MultYesMult1
NoR3F2LoadYesInteger
RkRjQkQjFkFjFiOpBusyName
Functional unit status
IntegerMult1FU
F30…F12F10F8F6F4F2F0
Register result status
Time=6 Second load reads operands; Mult is issued
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
Divide
NoYesIntegerF2F6F8SubYesAdd
Mult2
YesNoIntegerF4F2F0MultYesMult1
NoR3F2LoadYesInteger
RkRjQkQjFkFjFiOpBusyName
Functional unit status
AddIntegerMult1FU
F30…F12F10F8F6F4F2F0
Register result status
Time=7 Second load completes exec; Mult is stalled waiting for F2; Sub is issued
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
YesNoMult1F6F0F10DivYesDivide
YesYesF2F6F8SubYesAdd
Mult2
YesYesF4F2F0MultYesMult1
Integer
RkRjQkQjFkFjFiOpBusyName
Functional unit status
DivAddMult1FU
F30…F12F10F8F6F4F2F0
Register result status
Time=8 Second load writes result; Mult and Sub stalled (F2); Div is issued
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
YesNoMult1F6F0F10DivYesDivide
NoNoF2F6F8SubYesAdd
Mult2
NoNoF4F2F0MultYesMult1
Integer
RkRjQkQjFkFjFiOpBusyName
Functional unit status
DivAddMult1FU
F30…F12F10F8F6F4F2F0
Register result status
Time=9 Mult and Sub read operands; Div stalled waiting for (F0); Add not issued (SH)
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
YesNoMult1F6F0F10DivYesDivide
NoNoF2F6F8SubYesAdd
Mult2
NoNoF4F2F0MultYesMult1
Integer
RkRjQkQjFkFjFiOpBusyName
Functional unit status
DivAddMult1FU
F30…F12F10F8F6F4F2F0
Register result status
Time=10 Mult executing (1 out of 10 cycles); Sub executing (1 out of 2 cycles); Div stalled (F0);
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
YesNoMult1F6F0F10DivYesDivide
NoNoF2F6F8SubYesAdd
Mult2
NoNoF4F2F0MultYesMult1
Integer
RkRjQkQjFkFjFiOpBusyName
Functional unit status
DivAddMult1FU
F30…F12F10F8F6F4F2F0
Register result status
Time=11 Mult executing (2/10); Sub completes execution; Div stalled (F0);
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
YesNoMult1F6F0F10DivYesDivide
Add
Mult2
NoNoF4F2F0MultYesMult1
Integer
RkRjQkQjFkFjFiOpBusyName
Functional unit status
DivMult1FU
F30…F12F10F8F6F4F2F0
Register result status
Time=12 Mult executing (3/10); Sub writes result; Div stalled (F0);
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
YesNoMult1F6F0F10DivYesDivide
YesYesF2F8F6AddYesAdd
Mult2
NoNoF4F2F0MultYesMult1
Integer
RkRjQkQjFkFjFiOpBusyName
Functional unit status
DivAddMult1FU
F30…F12F10F8F6F4F2F0
Register result status
Time=13 Mult executing (4/10); Div stalled (F0); Add issued
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
YesNoMult1F6F0F10DivYesDivide
NoNoF2F8F6AddYesAdd
Mult2
NoNoF4F2F0MultYesMult1
Integer
RkRjQkQjFkFjFiOpBusyName
Functional unit status
DivAddMult1FU
F30…F12F10F8F6F4F2F0
Register result status
Time=14 Mult executing (5/10); Div stalled (F0); Add reads operands
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
YesNoMult1F6F0F10DivYesDivide
NoNoF2F8F6AddYesAdd
Mult2
NoNoF4F2F0MultYesMult1
Integer
RkRjQkQjFkFjFiOpBusyName
Functional unit status
DivAddMult1FU
F30…F12F10F8F6F4F2F0
Register result status
Time=15 Mult executing (6/10); Div stalled (F0); Add executes (1 of 2 cycles)
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
YesNoMult1F6F0F10DivYesDivide
NoNoF2F8F6AddYesAdd
Mult2
NoNoF4F2F0MultYesMult1
Integer
RkRjQkQjFkFjFiOpBusyName
Functional unit status
DivAddMult1FU
F30…F12F10F8F6F4F2F0
Register result status
Time=16 Mult executing (7/10 cycles); Div stalled (F0); Add completes exec
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
YesNoMult1F6F0F10DivYesDivide
NoNoF2F8F6AddYesAdd
Mult2
NoNoF4F2F0MultYesMult1
Integer
RkRjQkQjFkFjFiOpBusyName
Functional unit status
DivAddMult1FU
F30…F12F10F8F6F4F2F0
Register result status
Time=17 Mult executing (8/10); Div stalled (F0); Add stalled (WAR hazard on F6)
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
YesNoMult1F6F0F10DivYesDivide
NoNoF2F8F6AddYesAdd
Mult2
NoNoF4F2F0MultYesMult1
Integer
RkRjQkQjFkFjFiOpBusyName
Functional unit status
DivAddMult1FU
F30…F12F10F8F6F4F2F0
Register result status
Time=19 Mult completes exec; Div stalled (F0); Add stalled (WAR hazard on F6)
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
YesYesF6F0F10DivYesDivide
NoNoF2F8F6AddYesAdd
Mult2
Mult1
Integer
RkRjQkQjFkFjFiOpBusyName
Functional unit status
DivAddFU
F30…F12F10F8F6F4F2F0
Register result status
Time=20 Mult writes result; Div stalled (F0); Add stalled (WAR hazard on F6)
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
NoNoF6F0F10DivYesDivide
NoNoF2F8F6AddYesAdd
Mult2
Mult1
Integer
RkRjQkQjFkFjFiOpBusyName
Functional unit status
DivAddFU
F30…F12F10F8F6F4F2F0
Register result status
Time=21 Div reads operands; Add stalled (WAR hazard on F6)
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
NoNoF6F0F10DivYesDivide
Add
Mult2
Mult1
Integer
RkRjQkQjFkFjFiOpBusyName
Functional unit status
DivFU
F30…F12F10F8F6F4F2F0
Register result status
Time=22 Div executes (1/40); Add writes result
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
NoNoF6F0F10DivYesDivide
Add
Mult2
Mult1
Integer
RkRjQkQjFkFjFiOpBusyName
Functional unit status
DivFU
F30…F12F10F8F6F4F2F0
Register result status
Time=61 Div completes execution
COSC 6385 – Computer ArchitectureEdgar Gabriel
ADD.D F6, F8, F2
DIV.D F10, F0, F6
SUB.D F8, F6, F2
MUL.D F0, F2, F4
L.D F2, 45(R3)
L.D F6, 34(R2)
Write resultExecution completeRead operandsIssueInstruction
Instruction status
Divide
Add
Mult2
Mult1
Integer
RkRjQkQjFkFjFiOpBusyName
Functional unit status
FU
F30…F12F10F8F6F4F2F0
Register result status
Time=62 Div writes result
COSC 6385 – Computer ArchitectureEdgar Gabriel
Scoreboarding (IV) • Performance of scoreboarding depends on
– The amount of parallelism available among instructions– Number of scoreboard entries– Number and type of functional units– Presence of antidependeces and output dependences