csc 4250 computer architectures
DESCRIPTION
CSC 4250 Computer Architectures. October 20, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation. One More Example on Tomasulo’s Algorithm. L.DF0,0(R0) ADD.DF0,F0,F2 MUL.DF0,F0,F4 ADD.DF0,F0,F2 MUL.DF0,F0,F4 S.DF0,0(R0) ADD.DF0,F4,F2. - PowerPoint PPT PresentationTRANSCRIPT
CSC 4250Computer Architectures
October 20, 2006
Chapter 3. Instruction-Level Parallelism
& Its Dynamic Exploitation
One More Example on Tomasulo’s Algorithm
L.D F0,0(R0)
ADD.D F0,F0,F2
MUL.D F0,F0,F4
ADD.D F0,F0,F2
MUL.D F0,F0,F4
S.D F0,0(R0)
ADD.D F0,F4,F2
IBM 360 Assembly Language
Only two operands. Advantage? Disadvantage? Example:
L.D F0,0(R0)
ADD.D F0,F2
MUL.D F0,F4
ADD.D F0,F2
MUL.D F0,F4
S.D F0,0(R0)
… …
Figure 0.1Instruction Issue Execute Write Result
L.D F0,0(R0) √
ADD.D F0,F0,F2
MUL.D F0,F0,F4
ADD.D F0,F0,F2
MUL.D F0,F0,F4
S.D F0,0(R0)
ADD.D F0,F4,F2
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load 0+Reg[R0]
Add1 No
Add2 No
Add3 No
Mult1 No
Mult2 No
Store1 No
F0 F2 F4 F6 F8 F10 F12 … F30
Qi Load1
Figure 0.2Instruction Issue Execute Write Result
L.D F0,0(R0) √ √
ADD.D F0,F0,F2 √
MUL.D F0,F0,F4
ADD.D F0,F0,F2
MUL.D F0,F0,F4
S.D F0,0(R0)
ADD.D F0,F4,F2
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load 0+Reg[R0]
Add1 Yes Add Reg[F2] Load1
Add2 No
Add3 No
Mult1 No
Mult2 No
Store1 No
F0 F2 F4 F6 F8 F10 F12 … F30
Qi Add1
Figure 0.3Instruction Issue Execute Write Result
L.D F0,0(R0) √ √
ADD.D F0,F0,F2 √
MUL.D F0,F0,F4 √
ADD.D F0,F0,F2
MUL.D F0,F0,F4
S.D F0,0(R0)
ADD.D F0,F4,F2
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load 0+Reg[R0]
Add1 Yes Add Reg[F2] Load1
Add2 No
Add3 No
Mult1 Yes Mult Reg[F4] Add1
Mult2 No
Store1 No
F0 F2 F4 F6 F8 F10 F12 … F30
Qi Mult1
Figure 0.4Instruction Issue Execute Write Result
L.D F0,0(R0) √ √
ADD.D F0,F0,F2 √
MUL.D F0,F0,F4 √
ADD.D F0,F0,F2 √
MUL.D F0,F0,F4
S.D F0,0(R0)
ADD.D F0,F4F2
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load 0+Reg[R0]
Add1 Yes Add Reg[F2] Load1
Add2 Yes Add Reg[F2] Mult1
Add3 No
Mult1 Yes Mult Reg[F4] Add1
Mult2 No
Store1 No
F0 F2 F4 F6 F8 F10 F12 … F30
Qi Add2
Figure 0.5Instruction Issue Execute Write Result
L.D F0,0(R0) √ √
ADD.D F0,F0,F2 √
MUL.D F0,F0,F4 √
ADD.D F0,F0,F2 √
MUL.D F0,F0,F4 √
S.D F0,0(R0)
ADD.D F0,F4,F2
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load 0+Reg[R0]
Add1 Yes Add Reg[F2] Load1
Add2 Yes Add Reg[F2] Mult1
Add3 No
Mult1 Yes Mult Reg[F4] Add1
Mult2 Yes Mult Reg[F4] Add2
Store1 No
F0 F2 F4 F6 F8 F10 F12 … F30
Qi Mult2
Figure 0.6Instruction Issue Execute Write Result
L.D F0,0(R0) √ √
ADD.D F0,F0,F2 √
MUL.D F0,F0,F4 √
ADD.D F0,F0,F2 √
MUL.D F0,F0,F4 √
S.D F0,0(R0) √
ADD.D F0,F4,F2
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load 0+Reg[R0]
Add1 Yes Add Reg[F2] Load1
Add2 Yes Add Reg[F2] Mult1
Add3 No
Mult1 Yes Mult Reg[F4] Add1
Mult2 Yes Mult Reg[F4] Add2
Store1 Yes Store Mult2 0+Reg[R0]
F0 F2 F4 F6 F8 F10 F12 … F30
Qi Mult2
Figure 0.7Instruction Issue Execute Write Result
L.D F0,0(R0) √ √
ADD.D F0,F0,F2 √
MUL.D F0,F0,F4 √
ADD.D F0,F0,F2 √
MUL.D F0,F0,F4 √
S.D F0,0(R0) √
ADD.D F0,F4,F2 √
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load 0+Reg[R0]
Add1 Yes Add Reg[F2] Load1
Add2 Yes Add Reg[F2] Mult1
Add3 Yes Add Reg[F4] Reg[F2]
Mult1 Yes Mult Reg[F4] Add1
Mult2 Yes Mult Reg[F4] Add2
Store1 Yes Store Mult2 0+Reg[R0]
F0 F2 F4 F6 F8 F10 F12 … F30
Qi Add3
Figure 0.8Instruction Issue Execute Write Result
L.D F0,0(R0) √ √
ADD.D F0,F0,F2 √
MUL.D F0,F0,F4 √
ADD.D F0,F0,F2 √
MUL.D F0,F0,F4 √
S.D F0,0(R0) √
ADD.D F0,F4,F2 √ √ √
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load 0+Reg[R0]
Add1 Yes Add Reg[F2] Load1
Add2 Yes Add Reg[F2] Mult1
Add3 No
Mult1 Yes Mult Reg[F4] Add1
Mult2 Yes Mult Reg[F4] Add2
Store1 Yes Store Mult2 0+Reg[R0]
F0 F2 F4 F6 F8 F10 F12 … F30
Qi
Modified Loop-Based Example
Loop: L.D F0,0(R1)
MUL.D F0,F0,F2
ADD.D F0,F0,F4
S.D F0,0(R1)
DADDIU R1,R1,#−8
BNE R1,R2,Loop
Figure 0.1. One active iteration of loopInstruction Iteration Issue Execute Write Result
L.D F0,0(R1) 1 √ √
MUL.D F0,F0,F2 1 √
ADD.D F0,F0,F4 1 √
S.D F0,0(R1) 1 √
L.D F0,0(R1) 2
MUL.D F0,F0,F2 2
ADD.D F0,F0,F4 2
S.D F0,0(R1) 2
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load Reg[R1]
Load2 No
Add1 Yes Add Reg[F4] Mult1
Add2 No
Mult1 Yes Mult Reg[F2] Load1
Mult2 No
Store1 Yes Store Add1 Reg[R1]
Store2 No
F0 F2 F4 F6 F8 F10 F12 … F30
Qi Add1
Figure 0.2. Two active iterations of loopInstruction Iteration Issue Execute Write Result
L.D F0,0(R1) 1 √ √
MUL.D F0,F0,F2 1 √
ADD.D F0,F0,F4 1 √
S.D F0,0(R1) 1 √
L.D F0,0(R1) 2 √ √
MUL.D F0,F0,F2 2 √
ADD.D F0,F0,F4 2 √
S.D F0,0(R1) 2 √
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load Reg[R1]
Load2 Yes Load Reg[R1]-8
Add1 Yes Add Reg[F4] Mult1
Add2 Yes Add Reg[F4] Mult2
Mult1 Yes Mult Reg[F2] Load1
Mult2 Yes Mult Reg[F2] Load2
Store1 Yes Store Add1 Reg[R1]
Store2 Yes Add2 Reg[R1]-8
F0 F2 F4 F6 F8 F10 F12 … F30
Qi Add2
Figure 0.2. Two active iterations of loopInstruction Iteration Issue Execute Write Result
L.D F0,0(R1) 1 √ √
MUL.D F0,F0,F2 1 √
ADD.D F0,F0,F4 1 √
S.D F0,0(R1) 1 √
L.D F0,0(R1) 2 √ √
MUL.D F0,F0,F2 2 √
ADD.D F0,F0,F4 2 √
S.D F0,0(R1) 2 √
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load Reg[R1]
Load2 Yes Load Reg[R1]-8
Add1 Yes Add Reg[F4] Mult1
Add2 Yes Add Reg[F4] Mult2
Mult1 Yes Mult Reg[F2] Load1
Mult2 Yes Mult Reg[F2] Load2
Store1 Yes Store Add1 Reg[R1]
Store2 Yes Add2 Reg[R1]-8
F0 F2 F4 F6 F8 F10 F12 … F30
Qi Add2
Dynamic Branch Prediction
Static branch prediction in Appendix A Branch Prediction Buffer: a small memory
indexed by the lower portion of the address of the branch instruction. The memory contains a bit that says whether the branch was recently taken or not
The prediction bit may have been placed there by another instruction
Figure 3.14. A Branch Prediction Buffer Use the 4 low-order
address bits of the branch (word address) to choose a row.
Nested Loops
Loop1: L.D F2,1600(R1)DADDIU R2,R0,#80
Loop2: L.D F0,1000(R2)ADD.D F0,F0,F2S.D F0,1000(R2)DADDIU R2,R2,#−8BNEZ R2,Loop2DADDIU R1,R1,#−8BNEZ R1,Loop1