chapter 4 the processor cpre 381 computer organization and assembly level programming, fall 2013...

52
Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original slides provided by MKP

Upload: alessandro-galer

Post on 14-Dec-2015

233 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4

The Processor

CprE 381 Computer Organization and Assembly Level Programming, Fall 2013

Zhao ZhangIowa State UniversityRevised from original slides provided by MKP

Page 2: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Week 10 Overview Expected project progress: Complete Mini-

Project B, part 1 ALU data hazard and forwarding MEM data hazard, forwarding, and pipeline

stall Control hazard and branch execution

Chapter 1 — Computer Abstractions and Technology — 2

Page 3: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 3

Data Hazards from ALU Instructions

An instruction depends on completion of data access by a previous instruction add $s0, $t0, $t1

sub $t2, $s0, $t3 Consider this sequence:

sub $2, $1,$3and $12,$2,$5or $13,$6,$2add $14,$2,$2sw $15,100($2)

Page 4: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 4

Data Hazards from ALU Instructions

A naïve approach is to insert nops to wait out the dependence add $s0, $t0, $t1

sub $t2, $s0, $t3

Change to add $s0, $t0, $t1

noopnoopsub $t2, $s0, $t3

Page 5: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 5

Data Hazards in ALU Instructions

Another naïve approach is to stall the 2nd instruction in the dependence add $s0, $t0, $t1

sub $t2, $s0, $t3

Page 6: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Data Hazards in ALU Instructions

Observations on this scenario The first, ALU instruction produces a register

value The following instruction(s) consumes the

register value

sub $2, $1,$3and $12,$2,$5or $13,$6,$2add $14,$2,$2sw $15,100($2)

Chapter 1 — Computer Abstractions and Technology — 6

Page 7: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Data Hazards in ALU Instructions What is exactly the problem?

A register value is written to the register file in the WB stage, two cycles after the EX stage

The following instructions read the register value in the beginning of the ID stage

IF ID EX MEM WB

Chapter 1 — Computer Abstractions and Technology — 7

or and sub … …

or and sub …add

AND reads old $2

OR reads old $2

or and subaddswsub writes to $2add reads new $2

and sub … … … sub reads $1, $3

Page 8: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Data Hazards in ALU Instructions

Chapter 1 — Computer Abstractions and Technology — 8

Page 9: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 9

Forwarding (aka Bypassing) Use result when it is computed

The result is already in the pipeline Don’t wait for it to be stored in a register Requires extra connections in the datapath

Page 10: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 10

Dependencies & Forwarding

Page 11: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Data Forwarding

To what place: The two ALU inputs in the EX stage datapath Forwarded register value may replace the

values from ID From where: The destination register

value in pipeline registers Source 1: EX/MEM register Source 2: MEM/WB register

Chapter 1 — Computer Abstractions and Technology — 11

Page 12: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Data Forwarding

When to forward: Data dependence detected between Instructions at the EX and MEM stage Instructions at the EX and WB stage

How to detect: Compare source and destination register numbers

Chapter 1 — Computer Abstractions and Technology — 12

Page 13: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Data Forwarding Example

sub $2, $1,$3 # MEM=>EX forwardingand $12,$2,$5 # WB =>EX forwardingor $13,$6,$2add $14,$2,$2sw $15,100($2)

Chapter 1 — Computer Abstractions and Technology — 13

or and sub … …

or and sub …addAND gets forwarded new $2 value

or and subaddsw SUB gets forwardednew $2 value

IF ID EX MEM WB

Page 14: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Data Forwarding Logic Designsub $2, $1,$3 #and $12,$2,$5 # comp $2 with $2, $5or $13,$6,$2 # comp $2 with $6, $2

Detection: Compare rs and rt at EX, with rd at MEM and rd at WB

Those register numbers are in the IE/EX, EX/MEM, and MEM/WB registers rs was not in IE/EX register, we can add it

Chapter 1 — Computer Abstractions and Technology — 14

Page 15: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 15

Data Forwarding Logic Design Register numbers in pipeline

Source registers of the instruction at the EX stageID/EX.RegisterRs, ID/EX.RegisterRt

Destination register of the instruction at the MEM stage

EX/MEM.RegisterRd Destination register of the instruction at WB stage

MEM/WB.RegisterRd Potential data hazards when

1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1b. EX/MEM.RegisterRd = ID/EX.RegisterRt2a. MEM/WB.RegisterRd = ID/EX.RegisterRs2b. MEM/WB.RegisterRd = ID/EX.RegisterRt

Fwd fromEX/MEMpipeline reg

Fwd fromMEM/WBpipeline reg

Page 16: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 16

Data Forwarding Logic Design

But only if forwarding instruction will write to a register! EX/MEM.RegWrite=1, MEM/WB.RegWrite=1 It’s possible an instruction has a matching rd

but doesn’t write to register And only if Rd for that instruction is not

$zero EX/MEM.RegisterRd ≠ 0,

MEM/WB.RegisterRd ≠ 0 It’s allowed for an instruction to write to $0

Page 17: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 17

Forwarding Paths

The forwarding unit accesses three pipeline registersNote rs is added to IE/EX pipeline register

Page 18: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 18

Forwarding Conditions EX hazard: Data forwarding from EX/MEM register

if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10

if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10

MEM hazard: Data forwarding from MEM/WB register if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)

and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01

if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

This is not the final version (see slides 23)

Page 19: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 19

Datapath with Forwarding

Page 20: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Caveats Data forwarding happens in the beginning of

the cycle The forwarding unit is in the EX stage, with its

inputs from three pipeline stages A small overhead added to the critical path

latency of the EX stage For EX hazard, data forwarding is from

MEM to EX Precisely, the register value of the instruction

being executed at the MEM stage is forwarded to the instruction being executed at the EX stage

Chapter 1 — Computer Abstractions and Technology — 20

Page 21: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Caveats For MEM hazard, the forwarding is from the WB

to EX From the instruction at WB to the instruction at EX

Data forwarding is to EX not to ID An instruction may read obsolete register values at ID,

with the values latched at ID/EX register The correct values may be at EX (EX Hazard) or MEM

(MEM Hazard) Any obsolete values get replaced at EX

There is no WB hazard Register write at WB and register read at ID, for the

same register, may complete within one cycle

Chapter 1 — Computer Abstractions and Technology — 21

Page 22: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 22

Double Data Hazard Consider the sequence:

add $1,$1,$2add $1,$1,$3add $1,$1,$4

Both hazards occur Want to use the most recent

Revise MEM hazard condition Only fwd if EX hazard condition isn’t true

Page 23: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 23

Revised Forwarding Condition

MEM hazard (revision from slide 18) if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)

and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)

and (EX/MEM.RegisterRd = ID/EX.RegisterRs))

and (MEM/WB.RegisterRd = ID/EX.RegisterRs))

ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)

and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)

and (EX/MEM.RegisterRd = ID/EX.RegisterRt))

and (MEM/WB.RegisterRd = ID/EX.RegisterRt))

ForwardB = 01

Page 24: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Load-Use Data Hazard Load-use data hazard: A load instruction is

followed immediately by an instruction using the value of load

How is a load instruction different from an ALU instruction? ALU inst: destination register value available at

the end of the EX stage Load inst: destination register value available at

the end of the MEM stage Note the next instruction may need the value in

the beginning of its EX stageChapter 1 — Computer Abstractions and Technology — 24

Page 25: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 25

Load-Use Data Hazard Can’t always avoid stalls by forwarding

If value not computed when needed Can’t forward backward in time!

Page 26: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 26

Load-Use Data Hazard

Need to stall for one cycle

Page 27: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Load-Use HazardHow to insert a pipeline bubble (lost cycle)? lw $2, 20($1) sub $4, $2, $5 or $8, $2, $6

When the load instruction is at the EX stage Hold the instruction at the IF stage

Do not update the PC Hold the instruction at the ID stage

Do not change the IF/ID register Insert a nop at the EX stage

Make all control signals in ID/EX register to zero Particularly, RegWrite = 0 and MemWrite = 0

Move forward MEM and WBChapter 1 — Computer Abstractions and Technology — 27

Page 28: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 28

Load-Use Hazard DetectionTo detect, check if A load instruction is at the EX stage

ID/EX.MemRead = 1 The instruction at the ID stage reads the

register value of load ID/EX.RegisterRt = IF/ID.RegisterRs, or

ID/EX.RegisterRt = IF/ID.RegisterRt (for R-type)

If detected, stall IF and ID, insert bubble at EX, move forward MEM and MB

Page 29: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 29

Pipeline Stall The nop has all control signals set to zero

It does nothing at EX, MEM and WB Prevent update of PC and IF/ID register

Using instruction is decoded again (OK) Following instruction is fetched again (OK) 1-cycle stall allows MEM to read data for lw

Can subsequently forward from WB to EX

Need to add new control lines PCWrite for holding or updating PC IF/IDWrite for holding or update IF/ID register

Page 30: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 30

Stall/Bubble in the Pipeline

Stall inserted here

Page 31: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 31

Stall/Bubble in the Pipeline

Or, more accurately…

Page 32: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 32

Datapath with Hazard Detection

Page 33: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 33

Stalls and Performance

Stalls reduce performance But are required to get correct results

Compiler can arrange code to avoid hazards and stalls Requires knowledge of the pipeline structure

The BIG Picture

Page 34: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 34

Code Scheduling to Avoid Stalls

Reorder code to avoid use of load result in the next instruction

C code for A = B + E; C = B + F;

lw $t1, 0($t0)lw $t2, 4($t0)add $t3, $t1, $t2sw $t3, 12($t0)lw $t4, 8($t0)add $t5, $t1, $t4sw $t5, 16($t0)

stall

stall

lw $t1, 0($t0)lw $t2, 4($t0)lw $t4, 8($t0)add $t3, $t1, $t2sw $t3, 12($t0)add $t5, $t1, $t4sw $t5, 16($t0)

11 cycles13 cycles

Page 35: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 35

Control Hazards Branch determines flow of control

Two branch outcomes: Taken or Not-Taken Fetching next instruction depends on branch

outcome Pipeline can’t always fetch correct instruction

Still working on ID stage of branch In MIPS pipeline

Need to compare registers and compute target early in the pipeline

Add hardware to do it in ID stage

Page 36: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Control Hazards

Several caveats The CPU doesn’t recognize a branch until it

reaches the end of the ID stage Every cycle, the CPU has to fetch one

instruction Cannot afford to wait and see Must predict the next PC every cycle

The CPU may predict “always not-taken” (MIPS 5-stage pipeline)

Alternatively, the CPU may predict branch outcome dynamically (advanced CPU design)

Chapter 1 — Computer Abstractions and Technology — 36

Page 37: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Control Hazards This MIPS pipeline always predicts Not-

Taken Easy prediction: The next PC is current PC plus 4 No need to design complex branch prediction unit More Taken than Not-Taken in most programs

What happens if the branch is wrong? Will have mis-fetched instructions Flush those instructions before they take effect

i.e. Before they write to memory or register A Taken branch incurs a performance

penaltyChapter 1 — Computer Abstractions and Technology — 37

Page 38: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 38

Performance Impact If branch outcome determined in MEM

§4.8 Control H

azards

PC

Flush theseinstructions(Set controlvalues to 0)

Three cycles wasted on a taken branch

Page 39: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Performance Impact The performance loss is 3 cycles per taken

branch If branch outcome determined in MEM

Move execution of branch to the ID stage! Only beq and bne are supported in original MIPS

Testing equality and inequality is very fast, do it at the end of ID

Branch target can be calculate in ID Branch target = PC + extended offset PC and offset are known in the beginning of ID

At the of ID, the CPU knows the branch outcome and branch target

Chapter 1 — Computer Abstractions and Technology — 39

Page 40: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 40

Reducing Branch Delay Move hardware to determine outcome to ID

stage Target address adder Register comparator

Example code with branch taken36: sub $10, $4, $840: beq $1, $3, 744: and $12, $2, $548: or $13, $2, $652: add $14, $4, $256: slt $15, $6, $7 ...72: lw $4, 50($7)

Page 41: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 41

Example: Branch Taken

Page 42: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Early Branch Outcome Pipeline changes for early branch outcome

2nd PC adder and the shifter moved to ID Comparator added to ID No zero any more from ALU

CPU flushes one instruction for every taken branch CPU detects taken branch at ID The instruction at the IF will be flushed 1 lost cycles instead of 3 lost cycles per taken

branch

Chapter 1 — Computer Abstractions and Technology — 42

Page 43: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Pipeline Flushing

When CPU detects a taken branch at ID Update PC with branch target (already

have) Flush the instruction IF stage

Add flush signal to IF/ID pipeline When flushing, convert the instruction in IF/ID

register to 32-bit zeros 0x00000000 is “and $0, $0, $0”, effectively a

nop

Chapter 1 — Computer Abstractions and Technology — 43

Page 44: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 44

Example: Branch Taken

Note: Branch does nothing in EX, MEM and WB

Page 45: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 45

Pipeline Bubble on Branch Taken branch incurs a pipeline bubble

because of instruction flushing

Page 46: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Data Hazards for Branches

Moving branch execution to ID is not so easy May need another forwarding unit

The forwarding unit has to be in the ID stage The current forwarding unit, in the EX stage,

obviously doesn't work Need extensions to the hazard detection

unit, and more pipeline stalls Branch uses register values at ID, ALU and

load produce register values at EX and MEM

Chapter 1 — Computer Abstractions and Technology — 46

Page 47: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 47

Data Hazards for Branches If a comparison register is a destination of

2nd or 3rd preceding ALU instruction

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

add $4, $5, $6

add $1, $2, $3

beq $1, $4, target

Can resolve using forwarding From MEM to ID, and from WB to ID

Page 48: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 48

Data Hazards for Branches If a comparison register is a destination of

preceding ALU instruction or 2nd preceding load instruction May need 1 stall cycle However, beq needs the value at the end of ID

beq stalled

IF ID EX MEM WB

IF ID EX MEM WB

IF ID

ID EX MEM WB

add $4, $5, $6

lw $1, addr

beq $1, $4, target

Page 49: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 49

Data Hazards for Branches If a comparison register is a destination of

immediately preceding load instruction May need 2 stall cycles Again, beq needs the value at the end of ID, so

it’s possible to reduce stall to one cycle

beq stalled

IF ID EX MEM WB

IF ID

ID

ID EX MEM WB

beq stalled

lw $1, addr

beq $1, $0, target

Page 50: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Mini-Project C In Mini-Project C, implement

The simple MIPS pipeline Data forwarding and hazard detection Not-taken branch prediction with pipeline

flushing

Chapter 1 — Computer Abstractions and Technology — 50

Page 51: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Delayed Branch

Delayed branch may remove the one-cycle stall The instruction right after the beq is executed no

matter the branch is taken or not (sub instruction in the example)

Alternatingly saying, the execution of beq is delayed by one cycle

sub $10, $4, $8 beq $1, $3, 7 beq $1, $3, 7 => sub $10, $4, $8 and $12, $2, $5 and $12, $2, $5 Must find an independent instruction, otherwise

May have to fill in a nop instruction, or Need two variants of beq, delayed and not delayed

Chapter 1 — Computer Abstractions and Technology — 51

Page 52: Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original

Chapter 4 — The Processor — 52

Branch Prediction

We’ve actually studied one form of branch prediction: always not-taken Longer pipelines can’t readily determine

branch outcome early Stall penalty becomes unacceptable

Predict outcome of branch Only stall if prediction is wrong

In MIPS pipeline Can predict branches not taken Fetch instruction after branch, with no delay