lecture 17: basic pipelining - school of computingrajeev/cs3810/slides/3810-17.pdf · • perfect...
TRANSCRIPT
![Page 1: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/1.jpg)
1
Lecture 17: Basic Pipelining
• Today’s topics:
� 5-stage pipeline� Hazards and instruction scheduling
• Mid-term exam stats:� Highest: 90, Mean: 58
![Page 2: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/2.jpg)
2
Multi-Cycle Processor
• Single memory unit shared by instructions and memory• Single ALU also used for PC updates• Registers (latches) to store the result of every block
![Page 3: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/3.jpg)
3
The Assembly Line
A
Start and finish a job before moving to the next
Time
Jobs
Break the job into smaller stagesB C
A B C
A B C
A B C
Unpipelined
Pipelined
![Page 4: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/4.jpg)
4
Performance Improvements?
• Does it take longer to finish each individual job?
• Does it take shorter to finish a series of jobs?
• What assumptions were made while answering thesequestions?
• Is a 10-stage pipeline better than a 5-stage pipeline?
![Page 5: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/5.jpg)
5
Quantitative Effects
• As a result of pipelining:� Time in ns per instruction goes up� Each instruction takes more cycles to execute� But… average CPI remains roughly the same� Clock speed goes up� Total execution time goes down, resulting in lower
average time per instruction� Under ideal conditions, speedup
= ratio of elapsed times between successive instructioncompletions
= number of pipeline stages = increase in clock speed
![Page 6: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/6.jpg)
6
A 5-Stage Pipeline
![Page 7: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/7.jpg)
7
A 5-Stage Pipeline
Use the PC to access the I-cache and increment PC by 4
![Page 8: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/8.jpg)
8
A 5-Stage Pipeline
Read registers, compare registers, compute branch target; for now, assumebranches take 2 cyc (there is enough work that branches can easily take more)
![Page 9: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/9.jpg)
9
A 5-Stage Pipeline
ALU computation, effective address computation for load/store
![Page 10: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/10.jpg)
10
A 5-Stage Pipeline
Memory access to/from data cache, stores finish in 4 cycles
![Page 11: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/11.jpg)
11
A 5-Stage Pipeline
Write result of ALU computation or load into register file
![Page 12: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/12.jpg)
12
Conflicts/Problems
• I-cache and D-cache are accessed in the same cycle – ithelps to implement them separately
• Registers are read and written in the same cycle – easy todeal with if register read/write time equals cycle time/2(else, use bypassing)
• Branch target changes only at the end of the second stage-- what do you do in the meantime?
• Data between stages get latched into registers (overheadthat increases latency per instruction)
![Page 13: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/13.jpg)
13
Hazards
• Structural hazards: different instructions in different stages(or the same stage) conflicting for the same resource
• Data hazards: an instruction cannot continue because itneeds a value that has not yet been generated by anearlier instruction
• Control hazard: fetch cannot continue because it doesnot know the outcome of an earlier branch – special caseof a data hazard – separate category because they aretreated in different ways
![Page 14: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/14.jpg)
14
Structural Hazards
• Example: a unified instruction and data cache �stage 4 (MEM) and stage 1 (IF) can never coincide
• The later instruction and all its successors are delayeduntil a cycle is found when the resource is free � theseare pipeline bubbles
• Structural hazards are easy to eliminate – increase thenumber of resources (for example, implement a separateinstruction and data cache)
![Page 15: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/15.jpg)
15
Data Hazards
![Page 16: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/16.jpg)
16
Bypassing
• Some data hazard stalls can be eliminated: bypassing
![Page 17: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/17.jpg)
17
Data Hazard Stalls
![Page 18: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/18.jpg)
18
Data Hazard Stalls
![Page 19: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/19.jpg)
19
Example
add $1, $2, $3
lw $4, 8($1)
![Page 20: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/20.jpg)
20
Example
lw $1, 8($2)
lw $4, 8($1)
![Page 21: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/21.jpg)
21
Example
lw $1, 8($2)
sw $1, 8($3)
![Page 22: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/22.jpg)
22
Control Hazards
• Simple techniques to handle control hazard stalls:� for every branch, introduce a stall cycle (note: every
6th instruction is a branch!)� assume the branch is not taken and start fetching the
next instruction – if the branch is taken, need hardwareto cancel the effect of the wrong-path instruction
� fetch the next instruction (branch delay slot) andexecute it anyway – if the instruction turns out to beon the correct path, useful work was done – if theinstruction turns out to be on the wrong path,hopefully program state is not lost
![Page 23: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/23.jpg)
23
Branch Delay Slots
![Page 24: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/24.jpg)
24
Slowdowns from Stalls
• Perfect pipelining with no hazards � an instructioncompletes every cycle (total cycles ~ num instructions)� speedup = increase in clock speed = num pipeline stages
• With hazards and stalls, some cycles (= stall time) go byduring which no instruction completes, and then the stalledinstruction completes
• Total cycles = number of instructions + stall cycles
![Page 25: Lecture 17: Basic Pipelining - School of Computingrajeev/cs3810/slides/3810-17.pdf · • Perfect pipelining with no hazards an instruction completes every cycle (total cycles ~ num](https://reader031.vdocuments.site/reader031/viewer/2022040623/5d40dc9788c9938c3f8d18ca/html5/thumbnails/25.jpg)
25
Title
• Bullet