designing a pipelined cpu
DESCRIPTION
Read 4.7 for next class. Designing a Pipelined CPU. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Review -- Single Cycle CPU. Review -- Multiple Cycle CPU. Ifetch. - PowerPoint PPT PresentationTRANSCRIPT
Designing a Pipelined CPU
Read 4.7 for next class
Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Review -- Single Cycle CPU
Review -- Multiple Cycle CPU
Review -- Instruction Latencies
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec Mem WrLoad
•Single-Cycle CPU
•Multiple Cycle CPU
Ifetch Reg/Dec Exec WrAdd
Instruction Latencies and Throughput•Single-Cycle CPU
•Multiple Cycle CPU
•Pipelined CPU
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec Mem WrLoad
Which of the statements below is true about a pipelined processor?
Selection Statement
A Instruction latency remains essentially unchanged from single-cycle (minus some overheads); Instruction throughput increases
B Instruction latency remains essentially unchanged from multi-cycle (minus some overheads); Instruction throughput increases
C Instruction latency improves by a factor of 5 over single-cycle (minus some overheads); Instruction throughput increases
D Instruction latency improves by a factor of 5 over multi-cycle (minus some overheads); Instruction throughput increases
E None of the above
Instruction Latencies and Throughput
Instruction Latencies and Throughput•Single-Cycle CPU
•Multiple Cycle CPU
•Pipelined CPU
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
Ifetch Reg/Dec Exec Mem WrLoad
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec Mem WrLoad
Instruction Latencies and Throughput•Single-Cycle CPU
•Multiple Cycle CPU
•Pipelined CPU
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec Mem WrLoad
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec Mem WrLoad
Instruction Latencies and Throughput•Single-Cycle CPU
•Multiple Cycle CPU
•Pipelined CPU
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec Mem WrLoad
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec Mem WrLoad
Pipelining Advantages
• Higher maximum throughput
• Higher utilization of CPU resources
• But, more complicated datapath, more complex control(?)
PI throughputVs. latency
Pipelining ThroughputPeek Throughput
1. Longest instruction2. Average instruction3. Cycle time
PI Matching
Selection SC MC Pipeline
A 1 2 3B 1 2 1C 1 3 2D 3 1 2E None of the above
A Pipelined Datapath
IF: Instruction fetch
ID: Instruction decode and register fetch
EX: Execution and effective address calculation
MEM: Memory access
WB: Write back
Idea for a Pipelined Datapath
Execution in a Pipelined Datapath
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
lw
lw
lw
lw
lw
IF ID EX MEM WB
IF ID EX MEM WB
Execution in a Pipelined Datapath
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
lw
lw
lw
lw
lw
steadystate
steadystate
IF ID EX MEM WB
IF ID EX MEM WB
Make sure to pointOut Steady StateCPI = 1
Should we force every instruction to go through all 5 stages? Can we break it up like we did for multi-cycle, with R-type taking 4 cycles instead of 5?
Selection Yes/No Reason (Choose BEST answer)
A Yes Decreasing R-type to 4 cycles improves instruction throughput
B Yes Decreasing R-type to 4 cycles improves instruction latency
C No Decreasing R-type to 4 cycles causes hazards
D No Decreasing R-type to 4 cycles causes hazards and doesn’t impact throughput
E No Decreasing R-type to 4 cycles causes hazards and doesn’t impact latency
Pipeline Stages
Mixed Instructions in the Pipeline
IM Reg
AL
U Reg
IM Reg
AL
U DM Reg
CC1 CC2 CC3 CC4 CC5 CC6
lw
add
Pipeline Principles
• All instructions that share a pipeline must have the same stages in the same order.– therefore, add does nothing during Mem stage
– sw does nothing during WB stage
• All intermediate values must be latched each cycle.
• There is no functional block reuse
IM Reg A
LU DM Reg
IF ID EX MEM WB
Pipelined DatapathInstruction Fetch Instruction Decode/
Register FetchExecute/
Address CalculationMemory Access Write Back
registers!registers!
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
The Pipeline in Executionadd $10, $1, $2 Instruction Decode/
Register FetchExecute/
Address CalculationMemory Access Write Back
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Draw active datapath throughexample
The Pipeline in Executionlw $12, 1000($4) add $10, $1, $2 Execute/
Address CalculationMemory Access Write Back
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
The Pipeline in Executionsub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Memory Access Write Back
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
The Pipeline in ExecutionInstruction Fetch sub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Write Back
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
The Pipeline in ExecutionInstruction Fetch Instruction Decode/
Register Fetchsub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
The Pipeline in ExecutionInstruction Fetch Instruction Decode/
Register Fetchsub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Did someone spot the error??Write Register is wrong
The Pipeline in ExecutionInstruction Fetch Instruction Decode/
Register FetchExecute/
Address Calculationsub $15, $4, $1 lw $12, 1000($4)
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
The Pipeline, with controls But….
Pipeline Control
Selection SC MC Pipeline
A X Y YB Y X XC X Y XD Y X YE None of the above
Control Logic
X. Combinational LogicY. FSM or Microprogram
Pipelined Control
IF/I
D
ID/E
X
EX
/ME
M
ME
M/W
B
controlinstruction
Pipelined Control Signals
Execution Stage Control Lines Memory Stage Control Lines Write Back Stage ControlLines
Instruction RegDst ALUOp1 ALUOp0 ALUSrc Branch MemRead MemWrite RegWrite MemtoRegR-Format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw x 0 0 1 0 0 1 0 xbeq x 0 1 0 1 0 0 0 x
The Pipeline with Control Logic
Is it really this simple?
IM Reg
AL
U DM Reg
IM Reg
AL
U DM
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
What just happened here which is problematic (BEST ANSWER)?A. The register file is trying to read and write the same registerB. The ALU and data memory are both active in the same cycleC. A value is used before it is producedD. Both A and BE. Both A and C Neg edge!
Data Hazards• When a result is needed in the pipeline before it is
available, a “data hazard” occurs.
IM Reg
AL
U DM Reg
IM Reg
AL
U DM
IM Reg
AL
U DM Reg
IM Reg A
LU DM Reg
IM Reg
AL
U DM Reg
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
R2 AvailableR2 Available
R2 NeededR2 Needed
Use red and black lines!
Hazards continued
• What happens when...add $3, $10, $11
lw $8, 1000($3)
sub $11, $8, $7
Draw dependencies
The Pipeline in Executionlw $8, 1000($3) add $3, $10, $11 Execute/
Address CalculationMemory Access Write Back
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
The Pipeline in Executionsub $11, $8, $7 lw $8, 1000($3) add $3, $10, $11 Memory Access Write Back
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
The Pipeline in Executionadd $10, $1, $2 sub $11, $8, $7 lw $8, 1000($3) add $3, $10, $11 Write Back
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Pipelining Key Points
• ET = IC * CPI * CT
• We achieve high throughput without reducing instruction latency.
• Pipelining exploits a special kind of parallelism (parallelism between functionality required in different cycles).
• Pipelining uses combinational logic to generate (and registers to propagate) control signals.
• Pipelining creates potential hazards.
Summary comparison (MIPS designs)
CPI CT
Single Cycle
Multi-cycle
Pipeline