chapter 2: ilp and its exploitation

37
1 Chapter 2: ILP and Its Exploitation Review simple static pipeline ILP Overview Dynamic branch prediction Dynamic scheduling, out-of-order execution Multiple issue (superscalar) Hardware-based speculation ILP limitation Intel P6 microarchitecture

Upload: cera

Post on 15-Jan-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Chapter 2: ILP and Its Exploitation. Review simple static pipeline ILP Overview Dynamic branch prediction Dynamic scheduling, out-of-order execution Multiple issue (superscalar) Hardware-based speculation ILP limitation Intel P6 microarchitecture. Dynamic Scheduling. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 2:  ILP and Its Exploitation

1

Chapter 2: ILP and Its Exploitation• Review simple static pipeline• ILP Overview• Dynamic branch prediction• Dynamic scheduling, out-of-order execution• Multiple issue (superscalar)• Hardware-based speculation• ILP limitation• Intel P6 microarchitecture

Page 2: Chapter 2:  ILP and Its Exploitation

2

Dynamic Scheduling

• If an instruction is stalled, there’s no need to stall later instructions that aren’t dependent on any of the stalled instructions, i.e. out-of-order executionout-of-order execution

• Example: DIVD F0,F2,F4 Long-running ADDD F10,F0,F8 Depends on DIVD SUBD F12,F8,F14 Independent of both

• The ADDD is stalled before execution, but the SUBD can go ahead.

• Encounter WAW, WAR harzards

Page 3: Chapter 2:  ILP and Its Exploitation

3

Splitting Instruction Decode

• Single “Instruction Decode” stage split into 2 parts:– Instruction Issue or dispatch (in-order)

• Determine instruction type• Check for structural hazards

– Read Operands (can be out-of-order)• Stall instruction until no data hazards• Read operands• Release instruction to begin execution

• Need some sort of queue or buffer to hold instructions till their operands are ready.

• Note: Out-of-order completion makes precise exception handling difficult! How to handle?

IssueRead Operand

Queue

Instruction Decode

Page 4: Chapter 2:  ILP and Its Exploitation

4

Tomasulo’s Algorithm

• Tomasulo’s algorithm:– Another approach for dynamic scheduling, out-of-order

execution

– First used in IBM 360/91 FPU, many years ago

– Based on key concept of dynamic register renaming

• Like static renaming we used in loop-unroll example

• Some features:– Copes with long-latency operations (FPU or mem.)

– Eliminates WAR & WAR hazards without stalling

– Instructions issue as soon as their operands are ready, direct forwarding, bypass register

– Distributed hazard detection and execution control

Page 5: Chapter 2:  ILP and Its Exploitation

5

Tomasulo’s Algorithm

Instruction Fetch

Instruction Queue

Execution unit 1

Execution unit 2…

RegisterFile Reser-

vationStation

Issue Logic /Control Unit

Reser-vationStation

CommonData

Bus (CDB)

• Key differences (from Scoreboarding) :– Hazard detection & inst issue is done per execution unit– Data results go straight to where they are needed, use CDB– Loads/stores get their own execution units– Use Reservation StationReservation Station for register renaming

Page 6: Chapter 2:  ILP and Its Exploitation

6

Components of a Tomasulo Unit

• Reservation stations (RSs)– Buffer the operands to pending instructions while they

are waiting for operands to enter the execution units.

• Issue logic– Redirects (renames) instructions’ register outputs to

reservation-station slots.

– Results go directly to RSs rather than thru reg. file.

• Distributed hazard detection– Handled separately by each functional unit

• Load & store buffers (can be combined with RS)– Queue up memory access requests

Page 7: Chapter 2:  ILP and Its Exploitation

7

Simple FPU using Tomasulo’s Algorithm

Page 8: Chapter 2:  ILP and Its Exploitation

8

Major Steps in Tomasulo (Fig 2.12)

• Issue– Get instruction from FP instruction queue– If a slot in appropriate RS (or load-store buffer) is available,

send instruction there; else stall it (structural hazard).– Send operand values to RS if already available, otherwise,

just note the names (RS) where the operands to be available

• Execute– While operands not yet available, monitor CDB for them.– When all operands are in RS, begin executing instruction.

• Write result– When result available & CDB is free, write result to CDB,

then to registers & RS/store slots for receiving instructions.– Update register status, RS’s value, flag, busy state, etc.

Page 9: Chapter 2:  ILP and Its Exploitation

9

Example for Tomasulo’s Algorithm

• We will go through the same code fragment to see

how Tomasulo’s Algorithm handles out-of-order

Exec.

1. LD F6,34(R2)

2. LD F2,45(R3)

3. MULTD F0,F2,F4

4. SUBD F8,F6,F2

5. DIVD F10,F0,F6

6. ADDD F6,F8,F2

DataDependence

Anti-Dependence

OutputDependence

Page 10: Chapter 2:  ILP and Its Exploitation

10

Reservation Station Fields

• In each slot:– Op - The operation to perform on operands S1 & S2

– Qj, Qk - The RS slots that will produce S1, S2

– Vj, Vk - The values of S1 & S2.

– Busy - RS & its execution unit are occupied

• In register file entries & store buffer slots:– Qi - The RS slot containing the op whose result should

be stored here.

• In load and store buffers (combined in RS):– A : hold effective address for load and store.

Page 11: Chapter 2:  ILP and Its Exploitation

11

Tomasulo Example

Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F300 FU

Clock cycle counter

FU countdown

Instruction stream

3 Load/Buffers

3 FP Adder R.S.2 FP Mult R.S.

Page 12: Chapter 2:  ILP and Its Exploitation

12

Cycle 1Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F301 FU Load1

Page 13: Chapter 2:  ILP and Its Exploitation

13

Cycle 2Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F302 FU Load2 Load1

Note: Can have multiple loads outstanding

Page 14: Chapter 2:  ILP and Its Exploitation

14

Cycle 3Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F303 FU Mult1 Load2 Load1

• Note: registers names are removed (“renamed”) in Reservation Stations; MULT issued

• Load1 completing; what is waiting for Load1?

Page 15: Chapter 2:  ILP and Its Exploitation

15

Cycle 4Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 Yes SUBD M(A1) Load2Add2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F304 FU Mult1 Load2 M(A1) Add1

• Load2 completing; what is waiting for Load2?

Page 16: Chapter 2:  ILP and Its Exploitation

16

Cycle 5Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

2 Add1 Yes SUBD M(A1) M(A2)Add2 NoAdd3 No

10 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F305 FU Mult1 M(A2) M(A1) Add1 Mult2

• Timer starts down for Add1, Mult1

Page 17: Chapter 2:  ILP and Its Exploitation

17

Cycle 6Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

1 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2) Add1Add3 No

9 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F306 FU Mult1 M(A2) Add2 Add1 Mult2

• Issue ADDD here despite name dependency on F6?

Page 18: Chapter 2:  ILP and Its Exploitation

18

Cycle 7Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

0 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2) Add1Add3 No

8 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F307 FU Mult1 M(A2) Add2 Add1 Mult2

• Add1 (SUBD) completing; what is waiting for it?

Page 19: Chapter 2:  ILP and Its Exploitation

19

Cycle 8Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 No2 Add2 Yes ADDD (M-M) M(A2)

Add3 No7 Mult1 Yes MULTD M(A2) R(F4)

Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F308 FU Mult1 M(A2) Add2 (M-M) Mult2

Page 20: Chapter 2:  ILP and Its Exploitation

20

Cycle 9Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 No1 Add2 Yes ADDD (M-M) M(A2)

Add3 No6 Mult1 Yes MULTD M(A2) R(F4)

Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F309 FU Mult1 M(A2) Add2 (M-M) Mult2

Page 21: Chapter 2:  ILP and Its Exploitation

21

Cycle 10Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 No0 Add2 Yes ADDD (M-M) M(A2)

Add3 No5 Mult1 Yes MULTD M(A2) R(F4)

Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3010 FU Mult1 M(A2) Add2 (M-M) Mult2

• Add2 (ADDD) completing; what is waiting for it?

Page 22: Chapter 2:  ILP and Its Exploitation

22

Cycle 11Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

4 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3011 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

• Write result of ADDD here?• All quick instructions complete in this cycle!

Page 23: Chapter 2:  ILP and Its Exploitation

23

Cycle 12

Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

3 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3012 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

Page 24: Chapter 2:  ILP and Its Exploitation

24

Cycle 13

Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

2 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3013 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

Page 25: Chapter 2:  ILP and Its Exploitation

25

Cycle 14Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

1 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3014 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

Page 26: Chapter 2:  ILP and Its Exploitation

26

Cycle 15Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

0 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3015 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

• Mult1 (MULTD) completing; what is waiting for it?

Page 27: Chapter 2:  ILP and Its Exploitation

27

Cycle 16Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 No

40 Mult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3016 FU M*F4 M(A2) (M-M+M)(M-M) Mult2

• Just waiting for Mult2 (DIVD) to complete

Page 28: Chapter 2:  ILP and Its Exploitation

28

Cycle 55 (after skip cycles…)Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 No

1 Mult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3055 FU M*F4 M(A2) (M-M+M)(M-M) Mult2

Page 29: Chapter 2:  ILP and Its Exploitation

29

Cycle 56Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 No

0 Mult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3056 FU M*F4 M(A2) (M-M+M)(M-M) Mult2

• Mult2 (DIVD) is completing; what is waiting for it?

Page 30: Chapter 2:  ILP and Its Exploitation

30

Cycle 57Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56 57ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3056 FU M*F4 M(A2) (M-M+M)(M-M) Result

• Once again: In-order issue, out-of-order execution, and out-of-order completion.

Page 31: Chapter 2:  ILP and Its Exploitation

31

Tomasulo’s Two Major Advantages

• Distribution of the hazard detection logic– distributed reservation stations and the CDB

– If multiple instructions waiting on single result, & each

instruction has other operand, then instructions can be

released simultaneously by broadcast on CDB

– If a centralized register file were used, the units would

have to read their results from the registers when

register buses are available

• Elimination of stalls for WAW and WAR hazards

Page 32: Chapter 2:  ILP and Its Exploitation

32

Elimination of WAR Hazards

• Note the potential WAR hazard between DIVD and ADDD involving F6.

• But, as soon as DIVD enters the RS, it becomes independent of the ADDD!– The 2nd source operand no longer refers to F6, but

stores the value of F6 produced earlier by the LD.

– If the LD had not yet completed, the 2nd operand would then refer to its R.S., but still not to F6!

• So, ADDD can write its new value for F6 before DIVD executes, without messing it up!

Page 33: Chapter 2:  ILP and Its Exploitation

33

Elimination of WAW Hazards

• Note the potential WAW hazard between First LD and last ADD involving F6.

• But, as soon as ADD is issued, the register status table is updated with F6 assigned to “adder2”

• So, LD when it completes will not update F6, thus eliminate WAW

Page 34: Chapter 2:  ILP and Its Exploitation

34

Tomasulo Drawbacks

• Complexity– delays of 360/91, MIPS 10000, Alpha 21264,

IBM PPC 620 in CA:AQA 2/e, but not in silicon!

• Many associative stores (CDB) at high speed• Performance limited by Common Data Bus

– Each CDB must go to multiple functional units high capacitance, high wiring density

– Number of functional units that can complete per cycle limited to one!

• Multiple CDBs more FU logic for parallel assoc stores

• Non-precise interrupts!– this will be addressed later

Page 35: Chapter 2:  ILP and Its Exploitation

35

Overlap Loop Interactions

• Register renaming– Multiple iterations use different physical destinations for

registers (dynamic loop unrolling).

• Reservation stations – Permit instruction issue to advance past integer control

flow operations– Also buffer old values of registers - totally avoiding the

WAR stall

• Other perspective: Tomasulo building data flow dependency graph on the fly

• Note, branch prediction is still needed!

Page 36: Chapter 2:  ILP and Its Exploitation

36

Dynamic Loop Scheduling

• Loop example:– Loop: LD F0,0(R1)

MULTD F4,F0,F2

SD 0(R1),F4

SUBI R1,R1,#8

BNEZ R1,Loop

• Note data dependences can span loop iterations.

• But, using Tomasulo, & predict-taken, multiple iterations can issue and begin execution simultaneously!

• Like dynamic loop unrolling by the HW.

Page 37: Chapter 2:  ILP and Its Exploitation

37Check Figure 2.13