basic concepts1
TRANSCRIPT
-
7/31/2019 Basic Concepts1
1/18
1
Unit 3Pipelining
-
7/31/2019 Basic Concepts1
2/18
2
Overview
Pipelining is widely used in modern processors.
Pipelining improves system performance in
terms of throughput.
Pipelined organization requires sophisticated
compilation techniques.
-
7/31/2019 Basic Concepts1
3/18
3
Basic Concepts
-
7/31/2019 Basic Concepts1
4/18
4
Making the Execution of
Programs Faster
Use faster circuit technology to build the
processor and the main memory.
Arrange the hardware so that more than one
operation can be performed at the same time.
In the latter way, the number of operations
performed per second is increased even
though the elapsed time needed to perform anyone operation is not changed.
-
7/31/2019 Basic Concepts1
5/18
5
Traditional Pipeline Concept
Traditional Pipeline Concept
A B C D
-
7/31/2019 Basic Concepts1
6/18
6
Traditional Pipeline Concept
Traditional
Pipeline
Concept
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
6 PM 7 8 9 10 11 Midnight
Time
-
7/31/2019 Basic Concepts1
7/18
7
Traditional Pipeline Concept
Traditional
PipelineConcept
A
B
C
D
6 PM 7 8 9 10 11 Midnight
T
a
s
k
O
rd
e
r
Time
30 40 40 40 40 20
-
7/31/2019 Basic Concepts1
8/18
8
Traditional Pipeline Concept
TraditionalPipeline
ConceptA
B
C
D
6 PM 7 8 9
T
a
s
k
O
r
d
e
r
Time
30 40 40 40 40 20
-
7/31/2019 Basic Concepts1
9/18
9
Use the Idea of Pipelining in a
Computer
F1
E1
F2
E2
F3
E3
I1
I2
I3
(a) Sequential execution
Instructionfetchunit
Ex ecutionunit
Interstage buffer
B1
(b) Hardware organization
T ime
F1
E1
F2
E2
F3
E3
I1
I2
I3
Instruction
(c) Pipelined execution
Figure 8.1. Basic idea of instruction pipelining.
Clock cycle 1 2 3 4
Time
Fetch + Execution
-
7/31/2019 Basic Concepts1
10/18
10
Use the Idea of Pipelining in a
Computer
F4I4
F1
F2
F3
I1
I2
I3
D1
D2
D3
D4
E1
E2
E3
E4
W1
W2
W3
W4
Instruction
Figure 8.2. A 4-stage pipeline.
Clock cycle 1 2 3 4 5 6 7
(a) Instruction execution div ided into f our steps
F : Fetch
instruction
D : Decodeinstruction
and fetchoperands
E: Execute
operation
W : Write
results
Interstage buffers
(b) Hardware organization
B1 B2 B3
Time
Fetch + Decode
+ Execution + Write
Textbook page: 457
-
7/31/2019 Basic Concepts1
11/18
11
Role of Cache Memory
Each pipeline stage is expected to complete in one clock
cycle.
The clock period should be long enough to let the slowest
pipeline stage to complete. Faster stages can only wait for the slowest one to
complete.
Since main memory is very slow compared to the
execution, if each instruction needs to be fetched frommain memory, pipeline is almost useless.
Fortunately, we have cache.
-
7/31/2019 Basic Concepts1
12/18
12
Pipeline Performance
The potential increase in performance resulting
from pipelining is proportional to the number of
pipeline stages.
However, this increase would be achieved only
if all pipeline stages require the same time to
complete, and there is no interruption
throughout program execution. Unfortunately, this is not true.
-
7/31/2019 Basic Concepts1
13/18
13
Pipeline Performance
F1
F2
F3
I1
I2
I3
E1
E2
E3
D1
D2
D3
W1
W2
W3
Instruction
F4
D4
I4
Clock cycle 1 2 3 4 5 6 7 8 9
Figure 8.3. Effect of an execution operation taking more than one clock cycle.
E4
F5I5 D5
Time
E5
W4
-
7/31/2019 Basic Concepts1
14/18
14
Pipeline Performance
The previous pipeline is said to have been stalled for two clock cycles. Any condition that causes a pipeline to stall is called a hazard. Data hazard any condition in which either the source or the destination
operands of an instruction are not available at the time expected in thepipeline. So some operation has to be delayed, and the pipeline stalls.
Instruction (control) hazard a delay in the availability of an instructioncauses the pipeline to stall.
Structural hazard the situation when two instructions require the use ofa given hardware resource at the same time.
-
7/31/2019 Basic Concepts1
15/18
15
Pipeline Performance
F1
F2
F3
I1
I2
I3
D1
D2
D3
E1
E2
E3
W1
W2
W3
Instruction
Figure 8.4. Pipeline stall caused by a cache miss in F2.
1 2 3 4 5 6 7 8 9Clock cycle
(a) Instruction execution steps in successiv e clock cy cles
1 2 3 4 5 6 7 8Clock cycle
Stage
F: Fetch
D: Decode
E: Execute
W: Write
F1 F2 F3
D1 D2 D3idle idle idle
E1 E2 E3idle idle idle
W1 W2idle idle idle
(b) Function perf ormed by each processor stage in success iv e clock cy cles
9
W3
F2 F2 F2
Time
Time
Idle periods
stalls (bubbles)
Instruction
hazard
-
7/31/2019 Basic Concepts1
16/18
16
Pipeline Performance
F1
F2
F3
I1
I2 (Load)
I3
E1
M2
D1
D2
D3
W1
W2
Instruction
F4I4
Clock cycle 1 2 3 4 5 6 7
Figure 8.5. Effect of a Load instruction on pipeline timing.
F5I5 D5
Time
E2
E3 W3
E4D4
Load X(R1), R2Structural
hazard
-
7/31/2019 Basic Concepts1
17/18
17
Pipeline Performance
Again, pipelining does not result in individual instructions
being executed faster; rather, it is the throughput that
increases.
Throughput is measured by the rate at which instructionexecution is completed.
Pipeline stall causes degradation in pipeline performance.
We need to identify all hazards that may cause the
pipeline to stall and to find ways to minimize their impact.
-
7/31/2019 Basic Concepts1
18/18
18
Quiz
Four instructions, the I2 takes two clock cycles
for execution. Pls draw the figure for 4-stage
pipeline, and figure out the total cycles needed
for the four instructions to complete.