ca226 — advanced computer architectureray/teaching/ca226/05-hazards.pdf · ca226 — advanced...

1

CA226 — AdvancedComputer Architecture

Stephen Blott <[email protected]>

Table of Contents


2

…Today:

• data hazards


3

…Recall:

• the MIPS pipeline implements instruction level parallelism

• ideally, up to five instructions are executed (in part) on any clock cycle

• if one instruction were to exit the pipeline on each cycle:

• then the CPI would be 1and, ideally, the MIPS pipeline approaches a CPI of 1


4

MIPS Pipeline


5

Example daddi r1,r1,1 daddi r2,r2,1 daddi r3,r3,1 daddi r4,r4,1 daddi r5,r5,1

Note

Note to self: see pipeline.s.


6

SpeedupIdeally:

• each instruction takes 5 cycles to execute

• however, 5 instructions are in the pipeline

• so the number of cycles per instruction approaches 1

Note

Note to self:Observe the effect on CPI of repeating the block of instructions, previous.


7

HazardsThe major hurdle to effective pipeline implementation is:

• hazards


8

Types of Hazard

Structural hazardsresource conflicts;hardware cannot support all instruction combinations simultaneously

Data hazardswhen one instruction depends upon the result (which is not yet available) of aprevious instruction(today)

Control hazardswhen the address of the next instruction cannot be determined immediately


9

Data Hazards — ExampleConsider:

dadd r1,r2,r3 ; instruction 1 dsub r4,r1,r5 ; instruction 2 and r6,r1,r5 ; instruction 3 or r8,r1,r9 ; instruction 4 xor r10,r1,r11 ; instruction 5

Instructions 2, 3, 4 and 5:

• each depend upon the result of instruction 1


10

Ok …

Turn off forwarding, and let’s try running that …

Note to self:

• see hazards1.s.


11

Illustration

Table 1. Two Read-After-Write (RAW) pipeline stalls:

1 2 3 4 5 6 7

dadd r1,r2,r3 IF ID Ex Mem WB*

dsub r4,r1,r5 IF ID RAW RAW *Ex

and r6,r1,r5 IF stall stall ID

or r8,r1,r9 IF

Note

This assumes that we can both write and read the register file in a single clock cycle.Typically, the write happens in the first half of the cycle, and the read in the secondhalf.


12

ObservationsThis is known as a read after write (or RAW) stall:

• instruction 2 is blocked at ID because one of its arguments (registers) is not yetavailable

• in this case, all subsequent instructions are blocked toowhich is known as a pipeline stall


13

Next, …Consider:

• the effect of replacing instruction 2 with a nop instruction(or any other, non-dependent instruction)


14

Illustration

Table 2. Still one RAW stall:

1 2 3 4 5 6 7


nop IF ID Ex Mem WB

and r6,r1,r5 IF ID RAW *Ex Mem

or r8,r1,r9 IF stall Id Ex


15

Next, …Finally, consider:

• the effect of replacing instruction 3 with a nop instruction(or any other, non-dependent instruction)


16

Illustration

Table 3. No stalls:

1 2 3 4 5 6 7


nop IF ID Ex Mem WB

nop IF ID Ex Mem

or r8,r1,r9 IF ID *Ex Mem


17

…We could:

• find (two) other (independent) instructions to insert between such write-readdependencies

• but such dependencies are commonand we rarely have enough instructions to fill the gaps


18

…However, such hazards are not insurmountable:

• the ALU produces the necessary value in cycle 3(although it is not written back to the register file until cycle 5)

• that value is not needed by instruction 2 until cycle 4


19

…

Table 4. The value is available after cycle 3:

1 2 3 4 5 6 7

dadd r1,r2,r3 IF ID Ex** Mem WB*

dsub r4,r1,r5 IF ID RAW RAW *Ex

and r6,r1,r5 IF stall stall ID

or r8,r1,r9 IF


20

ForwardingSolution:

• data paths are added:

• EX/Mem.ALUOutput → ID/EX.A (output)EX/Mem.ALUOutput → ID/EX.B (output)Mem/WB.ALUOutput → ID/EX.A (output)Mem/WB.ALUOutput → ID/EX.B (output)

• when a read-after-write is detected, the ALU input:(either ID/EX.A or ID/EX.B)is switched to one of the two available ALUOutput pipeline registers (Ex/Mem orMem/WB)


21

MIPS Pipeline


22

Forwarding

1 2 3 4 5 6 7

dadd r1,r2,r3 IF ID Ex** Mem WB

dsub r4,r1,r5 IF ID **Ex Mem WB

and r6,r1,r5 IF ID Ex Mem WB

or r8,r1,r9 IF ID Ex Mem

One of:

• EX/Mem.ALUOutput → ID/EX.AEX/Mem.ALUOutput → ID/EX.B


23

Forwarding

1 2 3 4 5 6 7

dadd r1,r2,r3 IF ID Ex Mem** WB

nop IF ID Ex Mem WB

and r6,r1,r5 IF ID **Ex Mem WB

or r8,r1,r9 IF ID Ex Mem

One of:

• Mem/WB.ALUOutput → ID/EX.AMem/WB.ALUOutput → ID/EX.B


24

The WinMIPS64 SimulatorThe WinMIPS64 simulator:

• supports forwardingit can be either enabled or disabled

• see: Configure/Enable Forwarding


25

…Try turning on forwarding:

• and running the example again…(hazards1.s)


26

Now, consider the following … daddi r1,r2,123 ; instruction 1 ld r4,0(r1) ; instruction 2 sd r4,8(r1) ; instruction 3

Here:

• there is a RAW dependency between the daddi instruction and the addresscalculation in both of the following instructions

• the address calculation is handled by the ALU,so these are handled by forwarding, as before


27

Illustration

Table 5. No stalls due to address calculation:

1 2 3 4 5 6 7

daddi r1,r2,123 IF ID Ex** Mem++ WB

ld r4,0(r1) IF ID **Ex Mem WB

sd r4,8(r1) IF ID ++Ex Mem WB

• EX/Mem.ALUOutput → ID/EX.A for cycle 4Mem/WB.ALUOutput → ID/EX.A for cycle 5


28

And, again …daddi r1,r2,123 ; instruction 1ld r4,0(r1) ; instruction 2sd r4,8(r1) ; instruction 3


29

And, again …daddi r1,r2,123 ; instruction 1ld r4,0(r1) ; instruction 2sd r4,8(r1) ; instruction 3

Also:

• the sd instruction depends upon the result of the ld


30

…

Table 6. This can be solved by forwarding too:

1 2 3 4 5 6 7

daddi r1,r2,123 IF ID Ex Mem WB

ld r4,0(r1) IF ID Ex Mem** WB

sd r4,8(r1) IF ID Ex **Mem WB

Here:

• Mem/WB.LMD → EX/MEM.B for cycle 6


31

In full …

1 2 3 4 5 6 7

daddi r1,r2,123 IF ID Ex++ Mem== WB

ld r4,0(r1) IF ID ++Ex Mem** WB

sd r4,8(r1) IF ID ==Ex **Mem WB

• EX/Mem.ALUOutput → ID/EX.A for cycle 4Mem/WB.ALUOutput → ID/EX.A for cycle 5Mem/WB.LMD → EX/MEM.B for cycle 6


32

…In all:

• four pipeline stalls are eliminated(note to self: see stalls1.s)


33

MIPS Pipeline


34

Unfortunately …Forwarding cannot solve all RAW problems:

ld r1,n(r0)dadd r2,r1,r0


35

…

Table 7. You can’t forward backwards in time:

1 2 3 4 5 6 7

ld r1,n(r0) IF ID Ex Mem** WB

dadd r2,r1,r0 IF ID **Ex Mem WB

Clearly:

• this is not possible


36

An Insurmountable Stall

Table 8. An inevitable stall of one cycle:

ld r1,n(r0) IF ID Ex Mem** WB

dadd r2,r1,r0 IF ID RAW **Ex Mem


37

More generally, …Unlike arithmetic instructions:

• loads yield values only after the Mem stage of the pipelineso stalls at Ex cannot be avoided


38

SuggestionWhen possible, replace:

dadd r3,r2,r1 ; some other, unrelated instructionld r4,N(r0)dadd r6,r5,r4 ; stall - can't forward backwards!


39

SuggestionWith:

ld r4,N(r0)dadd r3,r2,r1 ; some other, unrelated instructiondadd r6,r5,r4 ; doesn't stall - can forward from dadd

Now:

• when the final dadd reaches Ex:Mem/WB.LMD is available for forwarding


40

…

Note

A good compiler (or you!) should be able to spot such stalls and reorder theoperations.

We spot such stalls by observing that an ALU instruction immediately follows a loadupon which it depends.


41

ExampleCompile:

int a = b + c;int d = e + f;

Note to self:

• see psched1.s and psched2.s.


42

ExampleFirst, spot the problem:

ld r1,b(r0) ; a = b + cld r2,c(r0)dadd r5,r1,r2sd r5,a(r0)

ld r1,e(r0) ; d = e + fld r2,f(r0)dadd r5,r1,r2sd r5,d(r0)


43

ExampleThen, rewrite instructions such that there are no stalls:

ld r1,b(r0) ; a = b + cld r2,c(r0)dadd r5,r1,r2 ; stall, r2 not readysd r5,a(r0)

ld r1,e(r0) ; d = e + fld r2,f(r0)dadd r5,r1,r2 ; stall, r2 not readysd r5,d(r0)


44

ExampleWell, it’s helpful to use different registers:

ld r1,b(r0) ; a = b + cld r2,c(r0)dadd r5,r1,r2 ; stall, r2 not readysd r5,a(r0)

ld r3,e(r0) ; d = e + fld r4,f(r0)dadd r5,r3,r4 ; stall, r4 not readysd r5,d(r0)


45

ExampleNo stalls:

ld r1,b(r0)ld r2,c(r0)ld r3,e(r0) ; prevent stall (pulled up)dadd r5,r1,r2 ; no stall

ld r4,f(r0)sd r5,a(r0) ; prevent stall (pushed down)dadd r5,r3,r4 ; no stallsd r5,d(r0)


46

…This is known as:

• pipeline scheduling

In this case:

• use two extra registers

• avoid two stalls

• 13 cycles, instead of 15


47

AsideThe "13 versus 15 cycles" statement is misleading:

• it includes cycles for the pipeline to fill and empty

Actually:

• disregarding the filling of the pipeline:

• it’s 8 cycles, instead of 10so a speedup of 1.25


48

Summary 1Forwarding is simple:

• if the necessary data is available somewhere in the pipeline and when needed:then it can be forwarded to where it’s needed

The implementation in hardware of these strategies is an engineering decision:

• it is correct, in all cases, to stall the pipeline when such hazards are detected

• forwarding, however, improves performance at the cost of some additionalcomplexity


49

Summary 2Some types of (RAW) stall are unavoidable:

• however, it is often possible to reorder instructions such that they do not occur


50

Done<script> (function() { var mathjax = 'mathjax/MathJax.js?config=asciimath'; // var mathjax= 'http://smblott.computing.dcu.ie/mathjax/MathJax.js?config=asciimath'; var element= document.createElement('script'); element.async = true; element.src = mathjax;element.type = 'text/javascript'; (document.getElementsByTagName('HEAD')[0]||document.body).appendChild(element); })(); </script>

ca226 — advanced computer architectureray/teaching/ca226/05-hazards.pdf · ca226 — advanced...

Documents