hardware based speculation - bt.nitk.ac.in€¦ · hardware based speculation execute instructions...

Post on 06-Apr-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Hardware based Speculation● Execute instructions along predicted execution paths but

only commit the results if prediction was correct

● Instruction commit: allowing an instruction to update the register file when instruction is no longer speculative

● Need an additional piece of hardware to prevent any irrevocable action until an instruction commits

– Reorder Buffer● In-order commit● Stores instruction results before instruction commits● Clear ROB on misprediction● Exceptions

Tomasulo's Algorithm with Speculation

ROB – Loop Based Example

ROB

Entry Busy Instruction State Destination Value

1

2

3

4

5

6

7

8

9

10

no

yes

no

yes

yes

yes

yes

yes

yes

yes

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

DADDIU R1, R1, #-8

BNE R1, R2, LOOP

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

DADDIU R1, R1, #-8

BNE R1, R2, LOOP

Commit

Commit

Write Result

Write Result

Write Result

Write Result

Write Result

Write Result

Write Result

Write Result

F0

F4

0+Regs[R1]

R1

F0

F4

0+Regs[R1]

R1

Mem[0+Regs[R1]]

#1 * Regs[F2]

#2

Regs[R1]-8

Mem[#4]

#6 * Regs[F2]

#7

#4 - #8

Multiple Issue and Static Scheduling

To achieve CPI < 1, need to complete multiple instructions per clock

● Statically scheduled superscalar processors● VLIW (Very Long Instruction Word) processors● Dynamically scheduled superscalar processors

Multiple Issue Processors

Dynamic Scheduling + Multiple Issue + Speculation

Limit the number of instructions of a given class that can be issued in a “bundle”I.e. on FP, one integer, one load, one store

Examine all the dependencies among the instructions in the bundle

Also need multiple completion/commit

Dynamic Scheduling + Multiple Issue

Instructions Issues at clock

Executes at clock

Mem Access at clock

Write CDB at clock

1 LD R2, 0(R1)

1 DADDIU R2, R2, #1

1 SD R2, 0(R1)

1 DADDIU R1, R1, #8

1 BNE R2, R3, L

2 LD R2, 0(R1)

2 DADDIU R2, R2, #1

2 SD R2, 0(R1)

2 DADDIU R1, R1, #8

2 BNE R2, R3, L

3 LD R2, 0(R1)

3 DADDIU R2, R2, #1

3 SD R2, 0(R1)

3 DADDIU R1, R1, #8

3 BNE R2, R3, L

1

1

2

2

3

4

4

5

5

6

7

7

8

8

9

2 3

3

3

4

4

5

7

8

11

9

8

13

14

17

15

14

19

7

9

13

15

19

15

18

16

9

12

10

6

2-way Superscalar

Instructions Issues at clock

Executes at clock

Mem Access at clock

Write CDB at clock

Commits at clock

1 LD R2, 0(R1)

1 DADDIU R2, R2, #1

1 SD R2, 0(R1)

1 DADDIU R1, R1, #8

1 BNE R2, R3, L

2 LD R2, 0(R1)

2 DADDIU R2, R2, #1

2 SD R2, 0(R1)

2 DADDIU R1, R1, #8

2 BNE R2, R3, L

3 LD R2, 0(R1)

3 DADDIU R2, R2, #1

3 SD R2, 0(R1)

3 DADDIU R1, R1, #8

3 BNE R2, R3, L

1

1

2

2

3

4

4

5

5

6

7

7

8

8

9

2 3

3

3

4

4

5

7

5

8

6

6

10

8

11

9

9

13

7

6

10

9

13

10

12

10

7

9

7

6

Dynamic Scheduling + Multiple Issue + Speculation

5

7

8

8

9

10

11

11

12

13

14

14

2-way Superscalar

Literature on Processors● Efficient Reading of Papers in Science and Technolo

gy● Yeager, The MIPS R10000 Processor, MICRO,

1996.● Hinton et. al., The Microarchitecture of the Pentium 4

Processor. Intel Technology Journal Q1, 2001.● Smith and Sohi. Microarchitecture of Superscalar

Processors. Proc. of IEEE. 1995.● Kahle, et. al. Introduction to the Cell multiprocessor.

IBM J. RES. & DEV. 2005. ● Hammerlund, et. al., Haswell: The fourth generation

Intel Processor, MICRO 2014.

Extra

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Load1 Mult2

yes

yes

yesno

Load

MUL

MUL

Regs[F2]

Regs[F2]

Load1

Regs[R1] - 8

yes Load Regs[R1] + 0no

no

no

no

noyes

yes Store Mult1

Mult2

Regs[R1]+0

Regs[R1]-8

Load2

Store

Load2 Mult1

1

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Mult2

yes

yes

yes

Load

MUL

MUL

Regs[F2]

Regs[F2]

Load1

Regs[R1] - 8

yes Load Regs[R1] + 0

yes

yes Store Mult1

Mult2

Regs[R1]+0

Regs[R1]-8

Load2

Store

Load2

2

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Mult2

yes

yes

yes

Load

MUL

MUL

Regs[F2]

Regs[F2]

Load1

Regs[R1] - 8

yes Load Regs[R1] + 0

yes

yes Store Mult1

Mult2

Regs[R1]+0

Regs[R1]-8

Load2

Store

Load2

no

Mem[Regs[R1] + 0]

3

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Mult2

yes

yes

yes

Load

MUL

MUL

Regs[F2]

Regs[F2]

Regs[R1] - 8

yes

yes Store Mult1

Mult2

Regs[R1]+0

Regs[R1]-8

Load2

Store

Load2

no

Mem[Regs[R1] + 0]

4

√√

M:4

√ √

no

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Mult2

yes

yes

MUL

MUL

Regs[F2]

Regs[F2]

yes

yes Store Mult1

Mult2

Regs[R1]+0

Regs[R1]-8

Load2

Store

no

Mem[Regs[R1] + 0]

5

√√

M:4

√ √

no

Mem[Regs[R1] - 8]

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Mult2

yes

yes

MUL

MUL

Regs[F2]

Regs[F2]

yes

yes Store Mult1

Mult2

Regs[R1]+0

Regs[R1]-8Store

no

Mem[Regs[R1] + 0]

6

√√

M:4

√ √

no

Mem[Regs[R1] - 8]

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Mult2

yes

yes

MUL

MUL

Regs[F2]

Regs[F2]

yes

yes Store Mult1

Mult2

Regs[R1]+0

Regs[R1]-8Store

no

Mem[Regs[R1] + 0]

7

√√

M:4

√ √

no

Mem[Regs[R1] - 8]

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Mult2

yes

yes

MUL

MUL

Regs[F2]

Regs[F2]

yes

yes Store Mult1

Mult2

Regs[R1]+0

Regs[R1]-8Store

no

Mem[Regs[R1] + 0]

8

√√

M:4

√ √

no

Mem[Regs[R1] - 8]

Mul[F4]

no

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Mult2

yes MUL Regs[F2]

yes

yes Store

Mult2

Regs[R1]+0

Regs[R1]-8Store

no

9

√√

M:4

√ √

no

Mem[Regs[R1] - 8]

Mul[F4]

no

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Mult2

yes MUL Regs[F2]

yes

yes Store

Mult2

Regs[R1]+0

Regs[R1]-8Store

no

10

√√

M:4

√ √

no

Mem[Regs[R1] - 8]

Mul[F4]

no

no

Mem[Regs[R1] - 8]

no

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi

yes Regs[R1]-8Store

no

11

√√

M:4

√ √

no

√√

no

no

Mem[Regs[R1] - 8]

no

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi

yes Regs[R1]-8Store

no

12

√√

M:4

√ √

no

√√

no

no

Mem[Regs[R1] - 8]

no

√ √

VLIW Example

● Performance?● Overhead?

top related