![Page 1: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/1.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
m
Pipelined Architecturewith solutions
to data & control hazards
![Page 2: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/2.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mPipeline Processing Hazards
Structural Hazardhardware duplication
Data HazardPipeline StallSoftware (machine code) optimizationForwarding
Control HazardPipeline Flush (Instruction Invalidation)Delayed BranchingEarly Branch DetectionBranch History Table
![Page 3: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/3.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mPipeline Stall
AA
A
A
BCC
BA
IF ID EX Mem WB
BC BD C BE D C BF E D C B
A) ADD R1,R2,R3B) SUB R4,R3,R5C) MUL R4,R3,R1D) ...E) ...F) ...
time
Some stages must by repeated – other invalidatedReading the register being modified:
the same register can be referred to in ID (read) and in WB (write) stage – the writing can be done before (half clock cycle) reading
![Page 4: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/4.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mHardware Pipeline Stall
Deactivation (→0) of control signals for stages:Ex, Mem and WBPostponed writing to PC and IF/ID
![Page 5: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/5.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mSoftware "Pipeline Stall"
Software correction of data flow with NOP (No Operation)not truly optimization, but might be occasionally necessary
when hardware mechanisms are insufficient
next: LW R1,0(R3) MUL R1,R1,R1 SW R1,0(R3) SUBI R3,#4,R3 BNE R0,R3,next ...
next: LW R1,0(R3) NOP MUL R1,R1,R1 NOP NOP SW R1,0(R3) SUBI R3,#4,R3 NOP NOP BNE R0,R3,next ...
all the data hazards"solved" with NOPs
![Page 6: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/6.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mSoftware Optimization for Architecture
Static: optimization at compilation time (optimising compiler)
e.g. gcc -On -march=xxxDynamic: at run-time: executing instructions in optimal order detected by hardware
dynamic schedulingrename registersout of order executionspeculative execution
Beyond
the sc
ope of
this lec
ture
![Page 7: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/7.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mGCC Settingsgcc -o test test.c -O3 -march=athlon
![Page 8: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/8.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mStatic Optimization Example
next: LW R1,0(R3) MUL R1,R1,R1 SW R1,0(R3) SUBI R3,#4,R3 BNE R0,R3,next ...
next: LW R1,0(R3) SUBI R3,#4,R3 MUL R1,R1,R1 BNE R0,R3,next SW R1,4(R3) ...
For pipelined architecture with Single Delay Slot
![Page 9: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/9.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mForwarding
Efficient hardware solution to most data hazards
![Page 10: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/10.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mForwarding
Idea: direct data access from intermediate registers
![Page 11: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/11.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mTransfers (without Forwarding)
Only "forward" direction and final register modification
![Page 12: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/12.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mTransfers (with Forwarding)
EX/MEM → ALUMEM/WB → ALU
![Page 13: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/13.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mForwarding
Hardware solution to most data hazards (between EX-MEM, EX-WB stages)Transfer of most-up-to-date results from Ex/Mem and Mem/WB to ALU inputHardware: combinatorial comparators of:
register numbers to be modified (Ex/Mem.Rd lub Mem/WB.Rd)
● withregister numbers of operands for ALU
(ID/Ex.Rs lub ID/Ex.Rt)Destination register is always updated in program order
![Page 14: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/14.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mForwarding
Multiplexers at ALU input are controlled by forwarding alone (not by main control unit)
Frowarding is transparent for control unit and does not increase its complexityFor "deep" (or parallel) pipelines, forwarding complexity grows and limits its practical application
![Page 15: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/15.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mPipelined Architecture with Forwarding
(no jumps yet)
![Page 16: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/16.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mForwarding in action (1)
![Page 17: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/17.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mForwarding in action (2)
![Page 18: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/18.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mForwarding in action (3)
![Page 19: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/19.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mForwarding in action (4)
![Page 20: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/20.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mForwarding – ALUsrc correction
ALUSrc is set by main control unit onlyAutonomous forwarding operation require two independent multiplexers for ALU second input
![Page 21: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/21.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
m"Hard" Data Hazards
Forwarding cannot solve all data hazardse.g. Read After Write (RAW) – here: LW & ADD
![Page 22: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/22.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
m"Hard" Data Hazards
Necessary pipeline stall
![Page 23: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/23.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mHardware Pipeline Stall
Detection of hard data hazards must be done early (in ID)Additional RAW-hazard detection (combinatorial comparator) block is required in IDRAW-hazard detection block should be transparent for both main control and forwarding unitsRAW-hazard detects:
LW in stage EX (by examining ID/Ex.MemRead)conflicting instruction in ID (by opcode: R-type, SW, BEQ)matching numbers of registers:
● ID/Ex.Rt (LW destination) and ● IF/ID.Rs or IF/ID.Rt (conflicting instruction operands)
![Page 24: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/24.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mHardware Pipeline Stall
![Page 25: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/25.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mHardware Pipeline Stall in action (1)
![Page 26: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/26.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mHardware Pipeline Stall in action (2)
![Page 27: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/27.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mHardware Pipeline Stall in action (3)
![Page 28: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/28.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mHardware Pipeline Stall in action (4)
![Page 29: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/29.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mHardware Pipeline Stall in action (5)
![Page 30: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/30.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mHardware Pipeline Stall in action (6)
![Page 31: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/31.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mControl Hazard
Any jump/branch breaks the natural sequence of instructions and spoils the pipeline (CPI > 1)Conditional branches (apart form address calculation) must also calculate the conditions – it may take timeJump/Branch execution will require a few following instructions to be invalidatedEffective solutions:
Early Branch Detection – requires additional hardwareBranch History Table – the best, but still based on guess
![Page 32: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/32.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mControl Hazard
Late branch detection (our unmodified architecture): branch condition evaluated at EX, active at MEM,
target instruction fetched after 3 cycles of delay
![Page 33: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/33.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mEarly Branch Detection Hardware
![Page 34: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/34.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mEarly Branch Detection
Condition (simple) is calculated in ID stage – only one stage of delay will be introduced (instruction in IF)Only simple condition is allowed (e.g. comparison), since the registers must be read from register fileAdditional address needed in ID – dedicated for jump/branch address calculationInstruction in IF must be invalidated – turned into NOP (effectively the same as invalidation)Invalidation in IF stage (→NOP) requires clearing the IF/ID intermediate register
providing, the NOP bit pattern (opcode + rest) is all 0's
![Page 35: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/35.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mEarly Branch Detection in action (1)
![Page 36: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/36.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mEarly Branch Detection in action (2)
![Page 37: Pipelined Architecture with solutions to data & control hazards](https://reader030.vdocuments.site/reader030/viewer/2022020706/61fc88208d33c02b785e43b6/html5/thumbnails/37.jpg)
Com
pute
r Arc
hite
ctur
e, IF
E CS
and
T&
CS, 4
thse
mBranch History Table (BHT)
IF ID EX
IF ID
jump next
instr_next EX WBMEM
Tablica skoków (jump table)
BHT entry: recent branch instruction address & validated target address
(+) No need to use early detection hardware(+) Complex and late condition calculation is allowed(+) No processing delay at all(-) Target is still a guess and requires validation(-) Misprediction causes invalidation of many instructions(-) Complex prediction strategies are needed (hardware)