multiple cycle processor summary

30
Savio Chau Multiple Cycle Processor Summary Essence of the Multiple Cycle Processor Break each Instruction into Smaller Steps Execute Each Step (Instead of the Entire Instruction) in One Cycle Cycle Time: Time It Takes to Execute the Longest Step Keep all the Steps to a Similar Length The Advantages of the Multiple Cycle Processor: Cycle Time is Much Shorter Different Instructions Take Different Number of Cycles to Complete Load Takes Five Cycles Jump Only Takes Three Cycles Allows a Functional Unit to be Used More than Once per Instruction The Disadvantages of the Multiple Cycle Processor: Each Instruction Needs Multiple Clock Cycles (although each cycle is much shorter than that of a single cycle data path) Only a Small Number of Function Units in The Data Path Are Active in Each Clock Cycle; Resources Are Wasted How to do better than multi-Cycle Processor? Pipelining

Upload: ronna

Post on 08-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Multiple Cycle Processor Summary. Essence of the Multiple Cycle Processor Break each Instruction into Smaller Steps Execute Each Step (Instead of the Entire Instruction) in One Cycle Cycle Time: Time It Takes to Execute the Longest Step Keep all the Steps to a Similar Length - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multiple Cycle Processor Summary

Savio Chau

Multiple Cycle Processor Summary• Essence of the Multiple Cycle Processor

– Break each Instruction into Smaller Steps

– Execute Each Step (Instead of the Entire Instruction) in One Cycle

– Cycle Time: Time It Takes to Execute the Longest Step

– Keep all the Steps to a Similar Length

• The Advantages of the Multiple Cycle Processor:– Cycle Time is Much Shorter

– Different Instructions Take Different Number of Cycles to Complete• Load Takes Five Cycles• Jump Only Takes Three Cycles

– Allows a Functional Unit to be Used More than Once per Instruction

• The Disadvantages of the Multiple Cycle Processor:– Each Instruction Needs Multiple Clock Cycles (although each cycle

is much shorter than that of a single cycle data path)

– Only a Small Number of Function Units in The Data Path Are Active in Each Clock Cycle; Resources Are Wasted

How to do better than multi-Cycle Processor? Pipelining

Page 2: Multiple Cycle Processor Summary

Savio Chau

Key Ideas Behind Pipelining• Example: Made- to- Order Hamburger Assembly Line:

– Assembly Line Stages: (1) Take the order, (2) Prepare the bun and sauce, (3) Prepare the meat and place it on the bun, (4) Layer the vegetables, (5) Package the product

– One Person for Each Stage

– Pass the Hamburger and Order to the Next person as Soon as One Finishes His Part

– Assume Each Stage Takes 3 minutes to Complete

– Each Individual Order Still Takes 15 minutes to Complete

– But with 5 People, All Orders can be Completed Much Faster

• Each Instruction has Multiple Stages– The Functional Units in Each Stage are Independent

– Each Functional Unit Is Used Only Once in Each Instruction

– The Next Instruction can Start as Soon as an Instruction Finishes Its Instruction Fetch Stage

– Each Instruction Still Takes The Same Number of Cycles to Complete

– The Throughput, However, is Much Higher

Page 3: Multiple Cycle Processor Summary

Savio Chau

Example: The 5 Stages of Load

Without Pipelining:Latency = 5 cycles

Throughput = 0.2 instructions/cycle (or cycles/instruction = 5)

• IFetch: Instruction Fetch– Fetch the Instruction from the Instruction Memory

• Reg/ Dec: Registers (Operand) Fetch and Instruction Decode• Exec: Calculate the Memory Address• Mem: Read the Data from the Data Memory• Wr: Write the Data Back to the Register File

Page 4: Multiple Cycle Processor Summary

Savio Chau

Pipelining The Load Instructions

• The Five Independent Functional Units In the Pipeline Datapath are:– Instruction Memory for the IFetch Stage– Register File’s Read Ports (busA and busB) for the Reg/ Dec Stage– ALU for the Exec Stage– Data Memory for the Mem Stage– Register File’s Write Port (busW) for the Wr Stage

• One Instruction Enters the Pipeline Every Cycle– One Instruction Comes Out of the Pipeline (Complete) Every Cycle– The “Effective” Cycles per Instruction (CPI) is 1

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Clk

1st lw

2nd lw

3rd lw

IFetch Reg/Dec Exec Mem WrBack

IFetch Reg/Dec Exec Mem WrBack

IFetch Reg/Dec Exec Mem WrBack

Question: How can we stack the execution of the instructions?Answer: By inserting “Pipeline Registers”

Page 5: Multiple Cycle Processor Summary

Savio Chau

Performance of An Ideal Pipeline

• Latency of Pipeline = Latency of a Single Task

• Potential Throughput Improvement = Number of Pipeline Stages Under The Ideal Situations That All Instructions Are Independent and No Branch Instructions

• Pipeline Rate is Limited by the Slowest Pipeline Stage

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Clk

1st lw

2nd lw

3rd lw

IFetch Reg/Dec Exec Mem WrBack

IFetch Reg/Dec Exec Mem WrBack

IFetch Reg/Dec Exec Mem WrBack

Page 6: Multiple Cycle Processor Summary

Savio Chau

Mixing Different Types of Instructions in a Pipeline

Load Instruction• IFetch: Instruction Fetch• Reg/ Dec: Registers (Operand) Fetch and

Instruction Decode• Exec: Calculate the Memory Address• Mem: Read the Data from the Data Memory• Wr: Write the Data Back to the Register File

R-Type Instruction• IFetch: Instruction Fetch• Reg/ Dec: Registers (Operand) Fetch and

Instruction Decode• Exec: ALU Operates on the Two Register

Operands• Wr: Write the Data Back to the Register File

Load and R-Type Instructions in Multi Cycle Data Path

Page 7: Multiple Cycle Processor Summary

Savio Chau

Pipelining an R- type and a Load Instruction

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Clk

R Type

R Type

lw

IFetch Reg/Dec Exec WrBack

IFetch Reg/Dec Exec WrBack

IFetch Reg/Dec Exec Mem WrBack

R Type IFetch Reg/Dec Exec WrBack

R Type IFetch Reg/Dec Exec WrBack

Cycle 8

Oops! We Get a Problem!

• Both the Load and R-Type Instructions Try to Write to the Register File at the Same Time!

Page 8: Multiple Cycle Processor Summary

Savio Chau

Solution: Delay R- Type’s Write by One Cycle

• This actually can easily be done. Just make sure all instructions pass through the same number of pipeline stages and set the control signals of the unused stages to NOP (i.e., all control signals set to 0)– Now R- Type Instructions Also Use Reg File’s Write Port at Stage 5– Mem Stage is a NOP Stage: Nothing is Being Done

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Clk

R Type

R Type

lw

IFetch Reg/Dec Exec Mem/Nop

IFetch Reg/Dec Exec Mem/Nop

IFetch Reg/Dec Exec Mem WrBack

R Type IFetch Reg/Dec Exec Mem/Nop

R Type IFetch Reg/Dec Exec Mem/Nop

Cycle 8

WrBack

WrBack

WrBack

WrBack

R Type IFetch Reg/Dec Exec Mem/Nop WrBack

Page 9: Multiple Cycle Processor Summary

Savio Chau

A Pipelined Datapath

Notice how the pipeline registers are labeled

Page 10: Multiple Cycle Processor Summary

Savio Chau

The Instruction Fetch Stage

Ins

tru

ct

Fe

tch

Un

it

lw instruction

Page 11: Multiple Cycle Processor Summary

Savio Chau

The Decode / Register Fetch Stage

The 2nd (R-type) instruction can start instruction fetch

lw instruction

Ins

tru

ct

Fe

tch

Un

it

Page 12: Multiple Cycle Processor Summary

Savio Chau

Load’s Address Calculation Stage

The 3rd (R-type) instruction can start instruction fetch

The 2nd (R-type) instruction can start decode/reg fetch

lw instruction

Ins

tru

ct

Fe

tch

Un

it

Page 13: Multiple Cycle Processor Summary

Savio Chau

Load’s Memory Access Stage

The 4th (R-type) instruction can start instruction fetch

The 3rd (R-type) instruction can start decode/reg fetch

The 2nd (R-type) instruction can start Execution

lw instruction

Ins

tru

ct

Fe

tch

Un

it

Page 14: Multiple Cycle Processor Summary

Savio Chau

Load’s Write Back Stage

The 5th (R-type) instruction can start instruction fetch

The 4th (R-type) instruction can start decode/reg fetch

The 3rd (R-type) instruction can start Execution

The 2nd (R-type) instruction is no-op in memory access

lw instruction

Ins

tru

ct

Fe

tch

Un

it

Page 15: Multiple Cycle Processor Summary

Savio Chau

How About Control Signals?Key Observations• Control signal at stage N is the function of only the instruction at that stage• When an instruction is in stage N, only the control signals in that stage are

relevant to the instructionIn

str

uc

t F

etc

h U

nit

Page 16: Multiple Cycle Processor Summary

Savio Chau

Pipelining the Control Signals

• The above observations imply the control signals can be pipelined:– The Main Control Generates the Control Signals During Reg/ Dec– Control Signals for Exec (ExtOp, ALUSrc, ...) are Used 1 Cycle Later– Control Signals for Mem (MemWr Branch) are Used 2 Cycles Later– Control Signals for Wr (MemtoReg RegWr) are Used 3 Cycles Later

Page 17: Multiple Cycle Processor Summary

Savio Chau

lw lwadd lwaddlw lwaddsub lw

Putting It All Together

Co

ntr

ol

Sig

na

ls

Co

ntr

ol

Sig

na

ls

Co

ntr

ol

Sig

na

lsExtOpALUSrcALUOpRegDstMemWrBranchMemtoRegRegWrMa

in C

on

tro

l

Inst

ruct

ion

Fet

ch

Un

it

PC

IF/I

D R

egis

ter

ID/E

X R

egis

ter

EX

/Mem

Re

gis

ter

Mem

/Wr

Re

gis

ter

RFileA

PC+4

I

Rs

Rt

Rt

Rd

Ra

Rb

Wa Wd

Imm16

Exec Unit

Imm16

Bus A

Bus B

Target

0

1

Zero Data MemRaWaWd

1

0

0

1

Page 18: Multiple Cycle Processor Summary

Savio Chau

Add Beq Instructions to the Pipeline

Store Instruction• IFetch: Instruction Fetch from Instruction

Memory• Reg/ Dec: Registers (Operand) Fetch and

Instruction Decode• Exec: Calculate the Memory Address• Mem: Write the Data into the Data Memory• Wr: Not applicable. All control signals set to 0

Beq Instruction• IFetch: Instruction Fetch from the Instruction

Memory• Reg/ Dec: Registers (Operand) Fetch and

Instruction Decode• Exec: ALU compares the Two Register

Operands– Adder Calculates the Branch Target Address

• Mem: If the Registers we Compared in the Exec Stage are the Same

– Write the Branch Target Address into the PC• Wr: Not applicable. All control signals set to 0

Store and Beq Instructions in Multi Cycle Data Path

Page 19: Multiple Cycle Processor Summary

Savio Chau

Pipelining Example with Beq

Page 20: Multiple Cycle Processor Summary

Savio Chau

Pipelining Example: End of Cycle 4

12: Beq’s ifetch 8: Store’s Dec 4: R-types’ Exec 0: Load’s Mem

Ins

tru

ct

Fe

tch

Un

it

Page 21: Multiple Cycle Processor Summary

Savio Chau

Pipelining Example: End of Cycle 5

12: Beq’s Dec 8: Store’s Exec 4: R-types’ Mem 0: Load’s Wr16: R-type’s ifetch

Ins

tru

ct

Fe

tch

Un

it

Page 22: Multiple Cycle Processor Summary

Savio Chau

Pipelining Example: End of Cycle 6

12: Beq’s Exec 8: Store’s Mem 4: R-types’ Wr16: R-type’s Dec20: R-type’s ifetch

Ins

tru

ct

Fe

tch

Un

it

Page 23: Multiple Cycle Processor Summary

Savio Chau

Pipelining Example: End of Cycle 7

12: Beq’s Mem(select PC addr)

20: R-type’s Dec 8: Store’s Wr(No op!!)

24: R-type’s ifetch 16: R-type’s Exec

Ins

tru

ct

Fe

tch

Un

it

Page 24: Multiple Cycle Processor Summary

Savio Chau

Example of Detailed Pipeline Operations

See MIPS Example in Class

rt

rd

ID/E

XPC

Addr

InstructionMemory

Rd Reg1RdReg2

RegistersWr RegWr Data

AddrRd Data

DataMemory

Wr Data

PCsrc IF/ID

4 Reg

Writ

e

ALU

src

ALUop

RegDst

Branch

Mem

Wr

Mem

toR

zero

out

<15:0>

Mem

Rd

rs

A

Zero

ALUout

0wb

exm

wb

IF/I

D

mwb

EX

/ME

M

ME

M/W

B

1

rt

Extrt

rdMux

Co

ntr

ol

0

1

Ad

d Ad

d

B

ALUControl

Mux

AL

U

0

1

rd

BMux

0

1

rd

ID/EX EX/MEM MEM/WB--/IF

Mux

AL

Uo

ut

md

o

<10:0>

<31:0>

<31:26>

x4

Clk PC 1 00 lw $2, 0($3) 2 04 add $4, $0, $5 3 08 sw $6, 4($3) 4 12 addi $7, $2, 100 5 16 add $8, $2, $5 6 20 add $9, $2, $4 7 24 sub $10, $4, $7 8 28 add $11, $7, $8

Clk PC 1 00 lw $2, 0($3) 2 04 add $4, $0, $5 3 08 sw $6, 4($3) 4 12 addi $7, $2, 100 5 16 add $8, $2, $5 6 20 add $9, $2, $4 7 24 sub $10, $4, $7 8 28 add $11, $7, $8

Page 25: Multiple Cycle Processor Summary

Savio Chau

Signal Propagation through the Example Pipeline

Instruct in

PC

PC

Instru

ct in IF/ID

IF/ID

.rs

IF/ID

.rt

IF/ID

.rd

IF/ID

.Imm

ed

16

Instru

ct in ID/E

X

ID/E

X.A

ID/E

X.B

ID/E

X.Im

me

d16

ID/E

X.rt

ID/E

X.rd

ID/E

X.A

LUsrc

ID/E

X.A

LUo

p

ID/E

X.R

egD

st

ID/E

X.B

ranch

ID/E

X.M

emW

r

ID/E

X.M

emR

d

ID/E

X.M

emtoR

ID/E

X.R

egW

rite

Instruct in

EX

/ME

M

EX

/ME

M.A

LUo

ut

EX

/ME

M.B

EX

/ME

M.rd

EX

/ME

M.b

ran

chad

d E

X/M

EM

.Zero

EX

/ME

M.B

ranch

EX

/ME

M.M

emW

r

EX

/ME

M.M

emR

d

EX

/ME

M.M

emtoR

EX

/ME

M.R

eg

Write

Instru

ct in ME

M/W

B

ME

M/W

B.m

do

ME

M/W

B.A

LUo

ut

ME

M/W

B.rd

ME

M/W

B.M

emto

R

ME

M/W

B.R

egW

rite

Clo

ck 4

ad

d

16

ad

di

2 7

X

100

sw

$3

$6

4 6

X

1

add

X

0 1 0

X

0

add

$0 +

$5

X

4

X

X

0 0 0 1 1

lw

Mem

[$3+

0]

$3 +

0

2 0 1

Clo

ck 5

add

20

add

2 5 8

X

ad

di

$2

$7

100

7

X

1

add

0 0 0 0 1 1

sw

$3 +

4

$6

X

X

X

X

1 0

X

0

add

X

$0 +

$5

4 1 1

Clo

ck 6

sub

24

add

2 4 9

X

add

$2

$5

X

5 8 0

add

1 0 0 0 1 1

ad

di

$2 +

100

X

7

X

X

0 0 0 1 1

sw

X

X

X

X

0

Clo

ck 7

sub

116

sub

4 7

10

X

add

$2

$4

X

4 9 0

add

1 0 0 0 1 1

add

$2 +

$5

$5

8

X

X

0 0 0 1 1

ad

di

X

sum

7 1 1

Page 26: Multiple Cycle Processor Summary

Savio Chau

LEON: Another Pipeline Processor Example

Page 27: Multiple Cycle Processor Summary

Savio Chau

Comparing Single Cycle, Multiple Cycle, & Pipeline

• Disadvantages of the Single Cycle Processor– Long cycle time

– Cycle time is too long for all instructions except the Load

• Multiple Clock Cycle Processor– Divide the instructions into smaller steps

– Execute each step (instead of the entire instruction) in one cycle

• Pipeline Processor– Natural enhancement of the multiple clock cycle processor

– Each functional unit can only be used once per instruction

– If an instruction is going to use a functional unit:• It must use it at the same stage as all other instructions

– Pipeline Control:• Each stage’s control signal depends ONLY on the instruction that is

currently in that stage

Page 28: Multiple Cycle Processor Summary

Savio Chau

Single Cycle, Multiple Cycle, vs. Pipeline

Idle

Page 29: Multiple Cycle Processor Summary

Savio Chau

Write with Real Memory: A Real World Problem

• At the Beginning of the Wr Stage, We Have a Problem If:– RegAdr’s (Rd or Rt) Clk- to- Q > RegWr’s Clk- to- Q

• Similarly, at the Beginning of the Mem Stage, We Have a Problem If:– WrAdr’s Clk- to- Q > MemWr’s Clk- to- Q

• Write signal happens too early and invalid data will be written. We Have a Race Condition Between Address and Write Enable!

Page 30: Multiple Cycle Processor Summary

Savio Chau

Synchronize Register File & Synchronize Memory

• Solution: And the Write Enable Signal with the Clock– This is the ONLY place where gating the clock is used

– MUST consult circuit expert to ensure no timing violation:• Example: Clock High Time > Write Access Delay