lec20 pipeline part3
DESCRIPTION
pipeline designTRANSCRIPT
![Page 1: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/1.jpg)
5-Stage Pipelining
Fetch Instruction
(FI)
FetchOperand
(FO)
Decode Instruction
(DI)
WriteOperand
(WO)
Execution Instruction
(EI)
S3 S4S1 S2 S5
1 2 3 4 98765S1S2
S5
S3S4
1 2 3 4 87651 2 3 4 765
1 2 3 4 651 2 3 4 5
Time
![Page 2: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/2.jpg)
Five Stage Instruction Pipeline
Fetch instruction Decode
instruction Fetch operands Execute
instructions Write result
![Page 3: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/3.jpg)
Two major difficulties Data Dependency Branch Difficulties
Solutions: Prefetch target instruction Delayed Branch Branch target buffer (BTB) Branch Prediction
![Page 4: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/4.jpg)
Data Dependency Use Delay Load to solve:
Example:load R1 R1M[Addr1]
load R2 R2M[Addr2] ADD R3R1+R2
Store M[addr3]R3
![Page 5: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/5.jpg)
Delay Load
![Page 6: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/6.jpg)
Delay Load
![Page 7: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/7.jpg)
Example Five instructions need to be carried out:
Load from memory to R1Increment R2Add R3 to R4Subtract R5 from R6Branch to address X
![Page 8: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/8.jpg)
Delay Branch
![Page 9: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/9.jpg)
Rearrange the Instruction
![Page 10: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/10.jpg)
Delayed Branch In this procedure, the compiler
detects the branch instruction and rearrange the machine language code sequence by inserting useful instructions that keep the pipeline operating without interrupts
![Page 11: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/11.jpg)
Prefetch target instruction Prefetch the target instruction in
addition to the instruction following the branch
If the branch condition is successful, the pipeline continues from the branch target instruction
![Page 12: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/12.jpg)
Branch target buffer (BTB) BTB is an associative memory Each entry in the BTB consists of
the address of a previously executed branch instruction and the target instruction for the branch
![Page 13: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/13.jpg)
Loop Buffer
Very fast memory Maintained by fetch stage of pipeline Check buffer before fetching from memory Very good for small loops or jumps The loop buffer is similar (in principle) to a
cache dedicated to instructions. The differences are that the loop buffer only retains instructions in sequence, and is much smaller in size (and lower in cost).
![Page 14: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/14.jpg)
Branch Prediction A pipeline with branch prediction
uses some additional logic to guess the outcome of a conditional branch instruction before it is executed
![Page 15: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/15.jpg)
Branch Prediction Various techniques can be used to predict
whether a branch will be taken or not: Prediction never taken Prediction always taken Prediction by opcode Branch history table
The first three approaches are static: they do not depend on the execution history up to the time of the conditional branch instruction. The last approach is dynamic: they depend on the execution history.
![Page 16: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/16.jpg)
Floating Point Arithmetic Pipeline Pipeline arithmetic units are
usually found in very high speed computers
They are used to implement floating-point operations, multiplication of fixed-point numbers, and similar computations encountered in scientific problems
![Page 17: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/17.jpg)
Floating Point Arithmetic Pipeline Example for floating-point addition and
subtraction Inputs are two normalized floating-point
binary numbers X = A x 2^a Y = B x 2^b
A and B are two fractions that represent the mantissas
a and b are the exponents
Try to design segments are used to perform the “add” operation
![Page 18: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/18.jpg)
Floating Point Arithmetic Pipeline Compare the exponents Align the mantissas Add or subtract the
mantissas Normalize the result
![Page 19: Lec20 Pipeline Part3](https://reader036.vdocuments.site/reader036/viewer/2022083013/5695d24a1a28ab9b0299d99c/html5/thumbnails/19.jpg)
Floating Point Arithmetic Pipeline X = 0.9504 x 103 and Y = 0.8200 x 102 The two exponents are subtracted in the first
segment to obtain 3-2=1 The larger exponent 3 is chosen as the exponent
of the result Segment 2 shifts the mantissa of Y to the right to
obtain Y = 0.0820 x 103 The mantissas are now aligned Segment 3 produces the sum Z = 1.0324 x 103 Segment 4 normalizes the result by shifting the
mantissa once to the right and incrementing the exponent by one to obtain Z = 0.10324 x 104