pipelining and hazards

60
Pipelining and Hazards CS 3410, Spring 2014 Computer Science Cornell University ee P&H Chapter: 4.6-4.8

Upload: urian

Post on 23-Feb-2016

39 views

Category:

Documents


1 download

DESCRIPTION

Pipelining and Hazards. CS 3410, Spring 2014 Computer Science Cornell University. See P&H Chapter: 4.6-4.8. Announcements. Prelim next week Tuesday at 7:30. Upson B17 [a-e]*, Olin 255[f-m]*, Philips 101 [n-z]* Go based on netid Prelim reviews - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pipelining and Hazards

Pipelining and Hazards

CS 3410, Spring 2014Computer ScienceCornell University

See P&H Chapter: 4.6-4.8

Page 2: Pipelining and Hazards

AnnouncementsPrelim next week

Tuesday at 7:30. Upson B17 [a-e]*, Olin 255[f-m]*, Philips 101 [n-z]*Go based on netid

Prelim reviewsFriday and Sunday evening. 7:30 again. Location: TBA on piazza

Prelim conflictsContact KB , Prof. Weatherspoon, Andrew Hirsch

SurveyConstructive feedback is very welcome

Page 3: Pipelining and Hazards

Administrivia

Prelim1:• Time: We will start at 7:30pm sharp, so come early• Loc: Upson B17 [a-e]*, Olin 255[f-m]*, Philips 101 [n-z]*• Closed Book

• Cannot use electronic device or outside material• Practice prelims are online in CMS

• Material covered everything up to end of this week• Everything up to and including data hazards• Appendix B (logic, gates, FSMs, memory, ALUs) • Chapter 4 (pipelined [and non] MIPS processor with hazards)• Chapters 2 (Numbers / Arithmetic, simple MIPS instructions)• Chapter 1 (Performance)• HW1, Lab0, Lab1, Lab2

Page 4: Pipelining and Hazards

Pipelining

Principle:Throughput increased by parallel executionBalanced pipeline very important

Else slowest stage dominates performance

Pipelining:• Identify pipeline stages• Isolate stages from each other• Resolve pipeline hazards (this and next lecture)

Page 5: Pipelining and Hazards

Basic PipelineFive stage “RISC” load-store architecture

1. Instruction fetch (IF)– get instruction from memory, increment PC

2. Instruction Decode (ID)– translate opcode into control signals and read registers

3. Execute (EX)– perform ALU operation, compute jump/branch targets

4. Memory (MEM)– access memory if needed

5. Writeback (WB)– update register file

Page 6: Pipelining and Hazards

Pipelined Implementation

• Each instruction goes through the 5 stages• Each stage takes one clock cycle• So slowest stage determines clock cycle time

Page 7: Pipelining and Hazards

Time Graphs1 2 3 4 5 6 7 8 9Clock cycle

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

add

lw

Page 8: Pipelining and Hazards

iClickerThe pipeline achievesA) Latency: 1, throughput: 1 instr/cycleB) Latency: 5, throughput: 1 instr/cycleC) Latency: 1, throughput: 1/5 instr/cycleD) Latency: 5, throughput: 5 instr/cycleE) None of the above

Page 9: Pipelining and Hazards

Time Graphs1 2 3 4 5 6 7 8 9Clock cycle

Latency: 5Throughput: 1 instruction/cycleConcurrency: 5

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

add

lw

Page 10: Pipelining and Hazards

Pipelined Implementation

• Each instruction goes through the 5 stages• Each stage takes one clock cycle• So slowest stage determines clock cycle time

• Stages must share information. How?• Add pipeline registers (flip-flops) to pass results

between different stages

Page 11: Pipelining and Hazards

Write-BackMemory

InstructionFetch Execute

InstructionDecode

extend

registerfile

control

Pipelined Processor

alu

memory

din dout

addrPC

memory

newpc

inst

IF/ID ID/EX EX/MEM MEM/WB

imm

BA

ctrl

ctrl

ctrl

BD D

M

computejump/branch

targets

+4

Page 12: Pipelining and Hazards

Pipelined Implementation

• Each instruction goes through the 5 stages• Each stage takes one clock cycle• So slowest stage determines clock cycle time

• Stages must share information. How?• Add pipeline registers (flip-flops) to pass results

between different stages

And is this it? Not quite….

Page 13: Pipelining and Hazards

Hazards3 kinds

• Structural hazards– Multiple instructions want to use same unit

• Data hazards– Results of instruction needed before ready

• Control hazards– Don’t know which side of branch to take

Will get back to thisFirst, how to pipeline when no hazards

Page 14: Pipelining and Hazards

IFStage 1: Instruction Fetch

Fetch a new instruction every cycle• Current PC is index to instruction memory• Increment the PC at end of cycle (assume no branches for

now)

Write values of interest to pipeline register (IF/ID)• Instruction bits (for later decoding)• PC+4 (for later computing branch targets)

Page 15: Pipelining and Hazards

IF

PC

instructionmemory

newpc

addr

+4

mc

00 = read word

Page 16: Pipelining and Hazards

IF

PC

instructionmemory

newpc

inst

addr mc

00 = read word

IF/ID

Rest

of p

ipel

ine

+4

PC+4

pcsel

pcregpcrel

pcabs

Page 17: Pipelining and Hazards

IDStage 2: Instruction Decode

On every cycle:• Read IF/ID pipeline register to get instruction bits• Decode instruction, generate control signals• Read from register file

Write values of interest to pipeline register (ID/EX)• Control information, Rd index, immediates, offsets, …• Contents of Ra, Rb• PC+4 (for computing branch targets later)

Page 18: Pipelining and Hazards

ID

ctrl

ID/EX

Rest

of p

ipel

ine

PC+4

inst

IF/ID

PC+4

Stag

e 1:

Inst

ructi

on F

etch

registerfile

WERd

Ra Rb

DB

A

BA

extend imm

decode

result

dest

Page 19: Pipelining and Hazards

EXStage 3: Execute

On every cycle:• Read ID/EX pipeline register to get values and control bits• Perform ALU operation• Compute targets (PC+4+offset, etc.) in case this is a branch• Decide if jump/branch should be taken

Write values of interest to pipeline register (EX/MEM)• Control information, Rd index, …• Result of ALU operation• Value in case this is a memory store instruction

Page 20: Pipelining and Hazards

Stag

e 2:

Inst

ructi

on D

ecod

e

pcrel

pcabs

EX

ctrl

EX/MEM

Rest

of p

ipel

ine

BD

ctrl

ID/EX

PC+4

BA

alu

+

branch?

imm

pcselpcreg

targ

et

Page 21: Pipelining and Hazards

MEMStage 4: Memory

On every cycle:• Read EX/MEM pipeline register to get values and control bits• Perform memory load/store if needed

– address is ALU result

Write values of interest to pipeline register (MEM/WB)• Control information, Rd index, …• Result of memory operation• Pass result of ALU operation

Page 22: Pipelining and Hazards

MEM

ctrl

MEM/WB

Rest

of p

ipel

ine

Stag

e 3:

Exe

cute

MD

ctrl

EX/MEM

BD

memory

din dout

addr

mctarg

et

branch?pcsel

pcrel

pcabs

pcreg

Page 23: Pipelining and Hazards

WBStage 5: Write-back

On every cycle:• Read MEM/WB pipeline register for values and control bits• Select value and write to register file

Page 24: Pipelining and Hazards

WBSt

age

4: M

emor

y

ctrl

MEM/WB

MD

result

dest

Page 25: Pipelining and Hazards

IF/ID

+4

ID/EX EX/MEM MEM/WB

mem

din dout

addrinst

PC+4

OP

BA

Rt

BD

MD

PC+4

imm

OP

Rd

OP

Rd

PC

instmem

Rd

Ra Rb

DB

A

Rd

Page 26: Pipelining and Hazards

Example: : Sample Code (Simple)

add r3, r1, r2; nand r6, r4, r5; lw r4, 20(r2); add r5, r2, r5; sw r7, 12(r3);

Page 27: Pipelining and Hazards

Example: Sample Code (Simple)

Assume eight-register machineRun the following code on a pipelined datapath

add r3 r1 r2 ; reg 3 = reg 1 + reg 2 nand r6 r4 r5 ; reg 6 = ~(reg 4 & reg 5) lw r4 20 (r2) ; reg 4 = Mem[reg2+20] add r5 r2 r5 ; reg 5 = reg 2 + reg 5 sw r7 12(r3) ; Mem[reg3+12] = reg 7

Slides thanks to Sally McKee

Page 28: Pipelining and Hazards

PC Instmem

Regi

ster

file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11-15Bits 16-20

op

Rt

imm

valB

valA

PC+4PC+4target

ALUresult

op

dest

valB

op

dest

ALUresult

mdata

instruction

0

R2

R3

R4

R5

R1

R6

R0

R7

regAregB

Bits 26-31

data

dest

IF/ID ID/EX EX/MEM MEM/WB

extend

MUX

Rd

Page 29: Pipelining and Hazards

PC Instmem

Regi

ster

file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11-15Bits 16-20

nop

0

0

0

000

0

nop

0

0

nop

0

0

0

0

nop

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits 26-31

data

dest

InitialState

IF/ID ID/EX EX/MEM MEM/WB

extend

0MUX

0

At time 1, Fetchadd r3 r1 r2

Page 30: Pipelining and Hazards

PC Instmem

Regi

ster

file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11-15Bits 16-20

nop

0

0

0

040

0

nop

0

0

nop

0

0

0

0

add 3 1 2

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits 26-31

data

dest

Fetch: add 3 1 2

add 3 1 2

Time: 1 IF/ID ID/EX EX/MEM MEM/WB

extend

0MUX

0

/ 2

/ add

/ 3

/ 4

/ 36

/ 9

/ 2

Page 31: Pipelining and Hazards

PC Instmem

Regi

ster

file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11-15Bits 16-20

add

3

9

36

480

0

nop

0

0

nop

0

0

0

0nand 6 4 5

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

12

Bits 26-31

data

dest

Fetch: nand 6 4 5

nand 6 4 5 add 3 1 2

Time: 2 IF/ID ID/EX EX/MEM MEM/WB

extend

2MUX

3

/ 3

/ 45

/ add/ nand

/ 9/ 6

/ 8

/ 18

/ 7

/ 5 / 3

/ 4

36

9

3

Page 32: Pipelining and Hazards

PC Instmem

Regi

ster

file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11-15Bits 16-20

nand

6

7

18

8124

45

add

3

9

nop

0

0

0

0lw 4 20(2)

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

45

Bits 26-31

data

dest

Fetch: lw 4 20(2)

lw 4 20(2) nand 6 4 5 add 3 1 2

Time: 3

36

9

3

IF/ID ID/EX EX/MEM MEM/WB

extend

5MUX

6 32

/ 45

/ 3

/ 4

nand ()

18 = 01 0010 7 = 00 0111------------------ -3 = 11 1101

/ -3

/ add

/ 18

/ 7

/ nand

/ 7

/ 6

/ 8

Page 33: Pipelining and Hazards

PC Instmem

Regi

ster

file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11-15Bits 16-20

lw

20

18

9

12168

-3

nand

6

7

add

3

45

0

0add 5 2 5

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

24

Bits 26-31

data

dest

Fetch: add 5 2 5

add 5 2 5 lw 4 20(2) nand 6 4 5 add 3 1 2

Time: 4

18

7

6

45

3

IF/ID ID/EX EX/MEM MEM/WB

extend

4MUX

0 65

Page 34: Pipelining and Hazards

PC Instmem

Regi

ster

file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11-15Bits 16-20

add

5

7

9

162012

29

lw

4

18

nand

6

-3

0

0sw 7 12(3)

945187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

25

Bits 26-31

data

dest

Fetch: sw 7 12(3)

sw 7 12(3) add 5 2 5 lw 4 20 (2) nand 6 4 5 add 3 1 2

Time: 5

9

20

4

-3

6

45

3

IF/ID ID/EX EX/MEM MEM/WB

extend

5MUX

5 04

Page 35: Pipelining and Hazards

PC Instmem

Regi

ster

file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11-15Bits 16-20

sw

12

22

45

20 16

16

add

5

7

lw

4

29

99

09

45187

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

37

Bits 26-31

data

dest

No moreinstructions

sw 7 12(3) add 5 2 5 lw 4 20(2) nand 6 4 5

Time: 6

9

7

5

29

4

-3

6

IF/ID ID/EX EX/MEM MEM/WB

extend

7MUX

0 55

Page 36: Pipelining and Hazards

PC Instmem

Regi

ster

file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11-15Bits 16-20

20

57

sw

7

22

add

5

16

0

09

45997

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits 26-31

data

dest

No moreinstructions

nop nop sw 7 12(3) add 5 2 5 lw 4 20(2)

Time: 7

45

7

12

16

5

99

4

IF/ID ID/EX EX/MEM MEM/WB

extend

MUX

07

Page 37: Pipelining and Hazards

PC Instmem

Regi

ster

file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11-15Bits 16-20

sw

7

57

0

9

459916

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits 26-31

data

dest

No moreinstructions

nop nop nop sw 7 12(3) add 5 2 5

Time: 8

2257

22

16

5

Slides thanks to Sally McKee

IF/ID ID/EX EX/MEM MEM/WB

extend

MUX

Page 38: Pipelining and Hazards

PC Instmem

Regi

ster

file

MUXA

LU

MUX

4

Datamem

+

MUX

Bits 11-15Bits 16-20

9

459916

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits 21-23

data

dest

No moreinstructions

nop nop nop nop sw 7 12(3)

Time: 9 IF/ID ID/EX EX/MEM MEM/WB

extend

MUX

Page 39: Pipelining and Hazards

Takeaway

Pipelining is a powerful technique to mask latencies and increase throughput• Logically, instructions execute one at a time• Physically, instructions execute in parallel

– Instruction level parallelism

Abstraction promotes decoupling• Interface (ISA) vs. implementation (Pipeline)

Page 40: Pipelining and Hazards

Hazards

See P&H Chapter: 4.7-4.8

Page 41: Pipelining and Hazards

Hazards3 kinds

• Structural hazards– Multiple instructions want to use same unit

• Data hazards– Results of instruction needed before

• Control hazards– Don’t know which side of branch to take

Page 42: Pipelining and Hazards

Data Hazards

What about data dependencies (also known as a data hazard in a pipelined processor)?i.e. add r3, r1, r2 sub r5, r3, r4

Need to detect and then fix such hazards

Page 43: Pipelining and Hazards

Why do data hazards occur?

Data Hazards• register file reads occur in stage 2 (ID) • register file writes occur in stage 5 (WB)• instruction may read (need) values that are being

computed further down the pipeline– In fact this is quite common

Page 44: Pipelining and Hazards

Data Hazards

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

Clock cycle1 2 3 4 5 6 7 8 9

sub r5, r3, r4

lw r6, 4(r3)

or r5, r3, r5

sw r6, 12(r3)

add r3, r1, r2

time

Page 45: Pipelining and Hazards

iClicker

sub r5, r3, r4

lw r6, 4(r3)

or r5, r3, r5

sw r6, 12(r3)

add r3, r1, r2 How many data hazards due to r3 only

A) 1B) 2C) 3D) 4E) 5

Page 46: Pipelining and Hazards

Data Hazards

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

Clock cycle1 2 3 4 5 6 7 8 9

2. sub r5, r3, r4

3. lw r6, 4(r3)

4. or r5, r3, r5

5. sw r6, 12(r3)

1. add r3, r1, r2

time

r3 = 10

r3 = 20

r3 = 10 r3 = 20

r3 = 10

r3 = 10

r3 = 10

OK

Page 47: Pipelining and Hazards

Data Hazards

What about data dependencies (also known as a data hazard in a pipelined processor)?i.e. add r3, r1, r2 sub r5, r3, r4

How to detect?

Page 48: Pipelining and Hazards

IF/ID

+4

ID/EX EX/MEM MEM/WB

mem

din dout

addr

PC

instmem

Rd

Ra Rb

DB

A

detecthazard

Detecting Data Hazards

Rd

add r3, r1, r2sub r5, r3, r5or r6, r3, r4 add r6, r3, r8

inst

PC+4

OP

BA

Rt

BD

MD

PC+4

imm

OP

Rd

OP

Rd

for rA (IF/ID.rA ≠ 0 && (IF/ID.rA==ID/Ex.Rd IF/ID.rA==Ex/M.Rd IF/ID.rA==M/W.Rd))

Page 49: Pipelining and Hazards

Detecting Data Hazards

Data Hazards• register file reads occur in stage 2 (ID) • register file writes occur in stage 5 (WB)• next instructions may read values about to be written

– In fact this is quite common

How to detect? (IF/ID.Ra != 0 && (IF/ID.Ra == ID/EX.Rd ||

IF/ID.Ra == EX/M.Rd || IF/ID.Ra == M/WB.Rd))

|| (same for Rb)

Page 50: Pipelining and Hazards

Next Goal

• What to do if data hazard detected?

• Options• Nothing• Change the ISA to match implementation

• Stall• Pause current and subsequent instructions till safe

• Forward/bypass• Forward data value to where it is needed

Page 51: Pipelining and Hazards

Stalling

How to stall an instruction in ID stage• prevent IF/ID pipeline register update

– stalls the ID stage instruction• convert ID stage instr into nop for later stages

– innocuous “bubble” passes through pipeline• prevent PC update

– stalls the next (IF stage) instruction

Page 52: Pipelining and Hazards

StallingClock cycle

1 2 3 4 5 6 7 8

add r3, r1, r2

sub r5, r3, r5

or r6, r3, r4

add r6, r3, r8

r3 = 10

r3 = 20

time

IF ID Ex M W

IF ID Ex M W

IF ID Ex M

ID ID ID

IF IF IF

IF ID Ex

Stalls3 Stall

Page 53: Pipelining and Hazards

IF/ID

+4

ID/EX EX/MEM MEM/WB

mem

din dout

addr

PC

instmem

Rd

Ra Rb

DB

A

detecthazard

Detecting Data Hazards

Rd

add r3, r1, r2sub r5, r3, r5or r6, r3, r4 add r6, r3, r8

inst

PC+4

OP

BA

Rt

BD

MD

PC+4

imm

OP

Rd

OP

Rd

If detect hazard

WE=0

MemWr=0RegWr=0

Page 54: Pipelining and Hazards

Stalling

datamem

B

A

B

D

M

Dinst

mem

DrD B

A

Rd RdRd

WE

WE

Op

WE

Op

rA rB

PC

+4

Opnop

inst

/stall

add r3,r1,r2

(MemWr=0RegWr=0)

NOP = If(IF/ID.rA ≠ 0 && (IF/ID.rA==ID/Ex.Rd IF/ID.rA==Ex/M.Rd IF/ID.rA==M/W.Rd))

sub r5,r3,r5

or r6,r3,r4 (WE=0)

Page 55: Pipelining and Hazards

Stalling

datamem

B

A

B

D

M

Dinst

mem

DrD B

A

Rd RdRd

WE

WE

Op

WE

Op

rA rB

PC

+4

Opnop

inst

/stall

nop

(MemWr=0RegWr=0)

NOP = If(IF/ID.rA ≠ 0 && (IF/ID.rA==ID/Ex.Rd IF/ID.rA==Ex/M.Rd IF/ID.rA==M/W.Rd))

add r3,r1,r2sub r5,r3,r5

(MemWr=0RegWr=0)

or r6,r3,r4 (WE=0)

Page 56: Pipelining and Hazards

Stalling

datamem

B

A

B

D

M

Dinst

mem

DrD B

A

Rd RdRd

WE

WE

Op

WE

Op

rA rB

PC

+4

Opnop

inst

/stall

(MemWr=0RegWr=0)

NOP = If(IF/ID.rA ≠ 0 && (IF/ID.rA==ID/Ex.Rd IF/ID.rA==Ex/M.Rd IF/ID.rA==M/W.Rd))

add r3,r1,r2sub r5,r3,r5 nop nop

(MemWr=0RegWr=0)

(MemWr=0RegWr=0)

or r6,r3,r4 (WE=0)

Page 57: Pipelining and Hazards

StallingClock cycle

1 2 3 4 5 6 7 8

add r3, r1, r2

sub r5, r3, r5

or r6, r3, r4

add r6, r3, r8

r3 = 10

r3 = 20

time

IF ID Ex M W

IF ID Ex M W

IF ID Ex M

ID ID ID

IF IF IF

IF ID Ex

Stalls3 Stall

Page 58: Pipelining and Hazards

Stalling

How to stall an instruction in ID stage• prevent IF/ID pipeline register update

– stalls the ID stage instruction• convert ID stage instr into nop for later stages

– innocuous “bubble” passes through pipeline• prevent PC update

– stalls the next (IF stage) instruction

Page 59: Pipelining and Hazards

Takeaway

Data hazards occur when a operand (register) depends on the result of a previous instruction that may not be computed yet. A pipelined processor needs to detect data hazards.

Stalling, preventing a dependent instruction from advancing, is one way to resolve data hazards.

Stalling introduces NOPs (“bubbles”) into a pipeline. Introduce NOPs by (1) preventing the PC from updating, (2) preventing writes to IF/ID registers from changing, and (3) preventing writes to memory and register file. Bubbles in pipeline significantly decrease performance.

Page 60: Pipelining and Hazards