1 pipeline and vector processing chapter # 9. 2 contents parallel processing pipelining arithmetic...
TRANSCRIPT
1
PIPELINE AND VECTOR PROCESSING
CHAPTER # 9
2
CONTENTS
Parallel Processing Pipelining Arithmetic Pipeline Instruction Pipeline RISC Pipeline Vector Processing Array Processors
3
Figure 9-1
Processor with multiple functional units
Integer multiply
Adder-sub tractor
Floating-pointmultiply
Floating-pointdivide
Floating-pointAdd-subtract
Incrementer
Logic unit
Shift unit
Processorregister
To memory
4
Instruction and stream.
Single instruction stream, single data stream (SISD).
Single instruction stream, multiple data stream (SIMD).
Multiple instruction stream, single data stream (MISD).
Multiple instruction stream, multiple data stream (MIMD).
5
Figure 9-2
Example of Pipelining.
Ai Bi Ci
R1 Ai , R2 Bi
Input Ai and Bi
R3 R1 * R2, R4 Ci Multiply and input Ci
R5 R3 + R4 Add Ci to product
R1 R2
Multiplier
R3 R4
Adder
R5
6
ClockPulsenumber
Segment1R1 R2
Segment2R3 R4
Segment3 R5
1 A1 B1 ---- ---- ----
2 A2 B2 A1*B1 C1 ---- 3 A3 B3 A2*B2 C2 A1*B1+C1
4 A4 B4 A3*B3 C3 A2*B2+C2
5 A5 B5 A4*B4 C4 A3*B3+C3
6 A6 B6 A5*B5 C5 A4*B4+C4
7 A7 B7 A6*B6 C6 A5*B5+C5
8 ---- ---- A7*B7 C7 A6*B6+C6
9 ---- ---- ---- ---- A7*B7+C7
Table 9-1
Content of registers in pipeline example.
7
Figure 9-3
Four segment pipeline.
Clock
Input R4S1 R3R2 S4S3S2R1
8
Figure 9-4
Space-time diagram for pipeline.
1 2 3 4 5 6 7 8 9
T1 T2 T3 T4 T5 T6
T1 T2 T3 T4 T5 T6
T1 T2 T3 T4 T5 T6
T1 T2 T3 T4 T5 T6
Clock cycleSegment
:1
2
3
4
9
Figure 9-5
Multiple functional units in parallel.
P4
Ii+3
P3
Ii+2
P2
Ii+1
P1
Ii
10
Arithmetic Pipeline
Compare the exponents. Align the mantissas. Add or subtract the mantissas. Normalize the result.
11
Mantissas Exponents
a b A B
Segment 1
Segment 2
Segment 3
Segment 4
R
CompareExponentBy subtraction
R R
Choose exponent
R
Adjust
Exponent
R
R
Align mantissas
R
R
Add or subtract
mantissas
Normalize
result
R
Difference
Figure 9-6
Pipeline for floating-point and subtraction.
12
Instruction Pipeline
Fetch the instruction from memory. Decode the instruction. Calculate the effective address. Fetch the operands from memory. Execute the instruction. Store the result in the proper place.
13
Figure 9-7
Four-segment CPU pipeline.
Segment 1
Segment 2
Segment 3
Segment 4
Decode instructionAnd calculateEffective address
Fetch instruction from memory
Branch?
Fetch operandFrom memory
Execute instruction
Interrupt?Interrupthandling
Update PC
Empty pipe
yes
no
yes
no
14
FI is the segment that fetches an instruction.
DA is the segment that decodes the instruction and calculate the effective address.
FO is the segment that fetches the operand.
EX is the segment that executes the instruction.
Segments and their purpose.
15
1 2 3 4 5 6 7 8 9 10 11 12 13
1
2
3
4
5
6
7
Step:
Instruction:
(Branch)
FI DA FO EX
FI DA FO
FO FI DA
EX
EX
EX
EX
EX
EX
FO
FO
FO
FO
DA
DA
DA
DA
FI
FI
FI
FI
FI -- --
-- -- --
Figure 9-8
Timing of instruction pipeline.
16
Pipeline Conflicts
Resource conflicts Data dependency conflicts Branch difficulties conflicts
17
Three-segment instruction pipeline
I: Instruction fetch A: ALU operation E: Execute instruction
18
Delayed Load
LOAD R1 M[address 1]
LOAD R2 M[address 2]
ADD R3 R1+R2
STORE M[address 3] R3
19
E
654321
I
Clock cycles
I
I
I
A
A
A
A
E
E
E
E
1. Load R1
2. Load R2
3. Add R1+R2
4. Store R3
Pipeline timing with data conflict
7654321 Clock cycles
1. Load R1
2. Load R2
3. No-operation
4. Add R1+R2
5. Store R3
I
I
I
I
I
A
A
A
A
A
E
E
E
E
Pipeline timing with delayed load
Figure 9-9
Three segment pipeline timing.
20
Figure 9-10
Examples of delayed branch.
I
Clock cycles
I
I
I
A
A
A
A
E
E
E
E
1. Load
2. Increment
3. Add
4. Subtract
10987654321
5. Branch to X
6. NO-operation
7. NO-operation
8. Instruction in X
I A
I A E
E
I A E
I A E
Using no-operation instructions
21
I
Clock cycles
I
I
A
A
A
E
E
E
1. Load
2. Increment
4. Add
5. Subtract
3. Branch to X
6. Instruction in X
I A
I A E
E
I A E
1 2 3 4 5 6 7 8
Figure 9-10
Examples of delayed branch.
Rearranging instruction
22
Application of Vector Processing
Long range weather forecasting. Petroleum explorations. Seismic data analysis. Medical diagnosis. Aerodynamics and space flight simulations.
23
Figure 9-11
Instruction format for vector processor
Operationcode
Base addressSource 1
Base addressSource 2
Base addressdestination
Vectorlength
24
Figure 9-12
Pipeline for calculating an inner product
SourceA
SourceB
Multiplier pipeline
Adder pipeline
25
Figure 9-13
Multiple module memory organization
AR AR AR AR
DRDRDRDR
Memoryarray
Memoryarray
Memoryarray
Memoryarray
Address bus
Data bus
26
Types of Array Processors
Attached Array Processor SIMD Array Processor
27
Figure 9-14
Attached Array Processor with host computer
General-Purposecomputer
input-outputinterface
Attached arrayprocessor
Local memoryMain memoryHigh-speed memory to
Memory bus
28
Figure 9-15
SIMD array processor organization
Master controlunit
Main memory
PE1
PE2
PE3
PEn
M1
M2
M3
Mn