multicycle operations.ppt
TRANSCRIPT
-
8/10/2019 MULTIcycle OPERATIONS.ppt
1/24
1
COMP 206:
Computer Architecture andImplementation
Montek Singh
Wed, Sep 28, 2005
Topic: Pipelining -- Intermediate Concepts
(Multicycle Operations; Exceptions)
-
8/10/2019 MULTIcycle OPERATIONS.ppt
2/24
2
Outline
Multi-cycle operations
Floating-point operations Structural and data hazards
Interrupts, Faults and Exceptions Precise exceptions
Complications in pipelines
READING: Appendix A
-
8/10/2019 MULTIcycle OPERATIONS.ppt
3/24
3
Pipelining Multicycle Operations
Assume five-stage pipeline
Third stage (execution) has two functional units E1and E2
Instruction goes through either E1 or E2, but not both
E1 and E2 are not pipelined
Stage delay of E1 = 2 cycles
Stage delay of E2 = 4 cycles
No buffering on inputs of E1 and E2
Stage delay of other stages = 1 cycle Consider an instruction sequence of five instructions
Instructions 1, 3, 5 need E1
Instructions 2, 4 need E2
-
8/10/2019 MULTIcycle OPERATIONS.ppt
4/24
4
Space-Time Diagram: Multicycle Operations
Delay 1 2 3 4 5 6 7 8 9 10 11 12 13
1 IF 1 2 3 4 5 5 5
1 ID 1 2 3 4 4 4 5
2 E1 1 1 3 3 5 5
4 E2 2 2 2 2 4 4 4 4
1 MEM 1 3 2 5 41 WB 1 3 2 5 4
Out-of-order completion
3 finishes before 2, and 5 finishes before 4
Instructions may be delayed after entering the pipelinebecause of structural hazards
Instructions 2 and 4 both want to use E2 unit at same time
Instruction 4 stallsin ID unit
This causes instruction 5 to stallin IF unit
-
8/10/2019 MULTIcycle OPERATIONS.ppt
5/24
5
Floating-Point Operations in MIPS
IF ID
MEM
WB
A1 A2 A3 A4
M1 M2 M3 M4 M5 M6 M7
EX
DIV (25)
Structural hazard:
not fully pipelined
Structural hazard:
instructions have
varying running
times
WAW hazards
possible; WAR
hazards not
possible
Longer operation
latency impliesmore frequent
stalls for RAW
hazards
Out-of-order
completion; hasramifications for
exceptions
-
8/10/2019 MULTIcycle OPERATIONS.ppt
6/24
6
Structural Hazard on WB Unit1 2 3 4 5 6 7 8 9 10 11
DIV.D (issued at t = -16) D D D D D D D D D MEM WB
MUL.D F0, F4, F6 IF ID M1 M2 M3 M4 M5 M6 M7 MEM WBinteger instruction IF ID EX MEM WB
integer instruction IF ID EX MEM WB
ADD.D F2, F4, F6 IF ID A1 A2 A3 A4 MEM WB
integer instruction IF ID EX MEM WB
integer instruction IF ID EX MEM WB
L.D F2, 0(R2) IF ID EX MEM WB
This is worst-case scenario: max steady-state number of write ports is 1
Dont replicate resources; detect and serialize access as needed
Early resolution
Track use of WB in ID stage (using shift register), stall instructions there reservation register
Simplifies pipeline control; all stalls occur in ID adds shift register and write-conflict logic
Late resolution
Stall instructions at entry to MEM or WB stage
Complicates pipeline control (two stall locations)
-
8/10/2019 MULTIcycle OPERATIONS.ppt
7/247
1 2 3 4 5 6 7 8 9 10 11 12 13
DIV.D (issued at t = -16) D D D D D D D D D MEM WB
MULT.D F0, F4, F6 IF ID s M1 M2 M3 M4 M5 M6 M7 MEM WBinteger instruction IF s ID EX MEM WB
integer instruction IF ID EX MEM WB
ADD.D F2, F4, F6 IF ID s A1 A2 A3 A4 MEM WB
L.D F2, 0(R2) IF ID EX MEM WB
WAW Hazards
WAW hazard arises only when no instruction between ADD.D and L.D usesresult computed by ADD.D
Adding an instruction like ADD.D F8,F2,F4 before L.D would stall pipelineenough for RAW hazard to avoid WAW hazard
Can happen through a branch/trap (example in HP3, Section A.9)
Rare situation, but must still handle correctly
Hazard resolution
Delay the issue of L.D until ADD.D enters MEM
Cancel write of ADD.D
-
8/10/2019 MULTIcycle OPERATIONS.ppt
8/248
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
L: L.D F4, 0(R2) IF L M A A S S S S S S S D
M:MUL.D F0, F4, F6 ID L M M A A A A A A A S DA:ADD.D F2, F0, F8 EX L S S S S
S:S.D 0(R2), F2 Mult M M M M M M M
D:DIV.D F12, F4, F8 Add A A A A
Div D D D D D D
MEM L M A S
WB L M A S
RAW Hazards
Longer delays of FP operations increases number of stalls in response toRAW hazards
Two methods for reducing stalls
Compiler could have moved instruction D between instructions M and A,which would allow D to complete earlier; or hardware could detect thispossibility and issue instruction D out of order
ID stage is a bottleneck because instructions wait there for their operandsto be available; could add buffers (reservation stations) to functional unitsand let instructions await their operands there
-
8/10/2019 MULTIcycle OPERATIONS.ppt
9/24
-
8/10/2019 MULTIcycle OPERATIONS.ppt
10/2410
MIPS R4000 Floating-Point Pipeline
Stage Functional Unit Description
A FP adder Mantissa ADD stage
D FP divider Divide pipeline stage
E FP multiplier Exception test stage
M FP multiplier First stage of multiplier
N FP multiplier Second stage of multiplier
R FP adder Rounding stage
S FP adder Operand shift stage
U Unpack FP numbers
1 2 3 4
A x x
D
E
MN
R x x
S x x
U x
Add
Subtract
1 2 3 4 5 6 7 8
A xD
E x
M x x x x
N x x
R x
S
U x
Multiply
1 2 3 4 30 31 32 33 34 35 36
A x x x x
D x x x x x x
E
M
N
R x x x x
S
U xDivide
-
8/10/2019 MULTIcycle OPERATIONS.ppt
11/2411
Instruction Mixes in FP Pipeline: Adds Only
1 2 3 4
A x x
D
E
M
N
R x x
S x x
U x
Add
Subtract
Cant initiate
another add
on cycle 2Conflict here
Cant initiate
another add
on cycle 3
Conflict here
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
A x x y y x x y y x x y y
D
E
MN
R x x y y x x y y x x y y
S x x y y x x y y x x y y
U x y x y x y
Forbidden latencies: 1 and 2
Steady-state utilization (cycles 4 through 18)
= (5*7)/(8*15) = 35/120 = 29.17%
Total utilization (cycles 1 through 19)
= (5+5*7+2)/(8*19) = 42/152 = 27.63%
-
8/10/2019 MULTIcycle OPERATIONS.ppt
12/2412
FP Pipeline: Multiplies Only
1 2 3 4 5 6 7 8
A xD
E x
M x x x x
N x x
R x
S
U x
1 1 1 1 0 0 0 0
Multiply
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
A x y z x y z
D
E x y z x y zM x x x x y y y y z z z z x x x x y y y y z z z z
N x x y y z z x x y y z z
R x y z x y z
S
U x y z x y z
Collision vector:1 indicates forbidden latency
0 indicates allowed latency
Steady-state utilization (cycles 5-24)
= (5*10)/(8*20) = 50/160 = 31.25%
Total utilization (cycles 1-28)
= (5+5*10+5)/(8*28) = 60/224 = 26.79%
-
8/10/2019 MULTIcycle OPERATIONS.ppt
13/24
13
FP Pipeline: Adds and Multiplies
1 2 3 4
A x xD
E
M
N
R x x
S x xU x
Add
Subtract
1 2 3 4 5 6 7 8
A xD
E x
M x x x x
N x x
R x
SU x
Multiply
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
A a a m b b n a a m b b n a a m b b n
D
E m n m n m nM m m m m n n n n m m m m n n n n m m m m n n n n
N m m n n m m n n m m n n
R a a m b b n a a m b b n a a m b b n
S a a b b a a b b a a b b
U m a n b m a n b m a n b
Note out-of-order
completionSteady-state utilization
(cycles 6-21)
= (4*17)/(8*16) = 68/128
= 53.13%
Total utilization
= (12+4*17+22)/(8*28)
= 85/224 = 37.95%
-
8/10/2019 MULTIcycle OPERATIONS.ppt
14/24
14
Interrupts, Faults, or Exceptions
Synchronous, coerced interrupts that occur withininstructions and after which execution must resumeare the hardest to implement
See Figure A.27 in HP3
I/O
request
Async Coerced Between
instr.
Resume
OS call Sync Userrequest
Betweeninstr.
Resume
Breakpoint Sync Userrequest
Betweeninstr.
Resume
Power fail Async Coerced Withininstr.
Terminate
-
8/10/2019 MULTIcycle OPERATIONS.ppt
15/24
-
8/10/2019 MULTIcycle OPERATIONS.ppt
16/24
16
Problems on Sequential Processors Instruction modifies state early,
then causes an interrupt
State change must beundone
Example: First operand ofVAX instruction usesautodecrement addressingmode, which writes a
register. Trying to accesssecond operand causes apage fault. Since instructionexecution cannot becompleted, we must restorethe register written byautodecrement to its originalvalue
Long-running instructions
Not enough to be able torestore state, must makeprogress from interrupt tointerrupt
Example: MVC on IBM 360copies 256 bytes No virtual memory, so
interrupts not allowed to stopMVC
Example: MVC on IBM 370copies 256 bytes Has virtual memory, so first
access all pages involved;after that, no interrupts
allowed Example: MVCL on IBM 370
copies up to 224bytes Has VM; two addresses and
length are in registers
Registers saved and restored
on interrupts (makingprogress)
-
8/10/2019 MULTIcycle OPERATIONS.ppt
17/24
17
Interrupts in MIPS PipelinePipeline stage Problem exceptions
IF Page fault on instruction fetchMisaligned memory access
Memory-protection violationID Undefined or illegal opcodeEX Arithmetic exception
MEM Page fault on data fetchMisaligned memory accessMemory-protection violation
WB None
How do we stop and restart execution on an interrupt to keepit precise?
What problems do delayed branches cause?
What happens if multiple exceptions occur in the pipeline?
Can exceptions occur out-of-order?
What problems do multi-cycle instructions cause?
-
8/10/2019 MULTIcycle OPERATIONS.ppt
18/24
-
8/10/2019 MULTIcycle OPERATIONS.ppt
19/24
19
Complications with Delayed Branches1 2 3 4 5 6 7 8 9
1 branch F D X M W2 delay slot F D X M W
u BTA F D X M W
u+1 F D X M W
u+2 F D X M W
Suppose instruction 2 causes an exception (e.g., a page fault)after the taken branch completes (determining that the
branch outcome is true) Instruction 2 cannot complete
Neither can instruction u
On restart, we do not have sequential execution
We must remember two PC values: 2 and u
-
8/10/2019 MULTIcycle OPERATIONS.ppt
20/24
-
8/10/2019 MULTIcycle OPERATIONS.ppt
21/24
C li i i h l i l O i
-
8/10/2019 MULTIcycle OPERATIONS.ppt
22/24
22
Complications with Multicycle Operations
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28DIVF F0, F2, F4 F D X X X X X X X X X X X X X X X X X X X X X X X X M W
ADDF F10, F10, F8 F D X X X X M W
SUBF F12, F12, F14 F D X X X X M W
Instructions are independent (no hazards) and therefore issueimmediately
Differences in running times causes out-of-order termination
DIVF throws arithmetic exception late in its executionAt that point, ADDF and SUBF have both completed execution
and destroyed one of their operands
Can we maintain precise interrupts under these conditions?
l l d 2
-
8/10/2019 MULTIcycle OPERATIONS.ppt
23/24
23
FP Pipeline Exceptions: Solns. 1 and 2
Settle for imprecise interrupts (CRAY, with
checkpointing) Done on Alpha 21064 and 21164, IBM Power-1 and Power-2,
MIPS R8000 by supporting a fast imprecise mode and a slowprecise mode
Not an option if you have to support virtual memory or IEEEfloating point standard
Software finishes certain instructions (SPARC) Keep enough state around for trap handler to create a precise
sequence for exception and finish work for some instructionstages
Only FP instructions cause this problem1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
F D X X X X X X X X X X X X X X X M W
F D X X X X X X X X M W
F D X X X X X X X X M W
F D X X X X M W
FP Pi li E i S l 3 d 4
-
8/10/2019 MULTIcycle OPERATIONS.ppt
24/24
24
FP Pipeline Exceptions: Solns. 3 and 4
Stalling (MIPS R2000/3000, MIPS R4000, Pentium)
An instruction is allowed to issue only if it is certain that allthe instructions before the issuing instruction will completewithout causing an exception
To prevent excessive stalling, FP units must decide onpossibility of exceptions early in pipeline
General methods (PowerPC 620, MIPS R10000)
Reorder buffer, history file, future file
An instruction is allowed to finalize its writes only when all
previously issued instructions are complete More naturally used in connection with ILP (Chapter 4)
Significant complexity (to be discussed later)