fall 2002 lecture 14: instruction scheduling. saman amarasinghe 26.035 ©mit fall 1998 outline...

125
Fall 2002 Lecture 14: Instruction Scheduling

Upload: hugo-watkins

Post on 28-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Fall 2002

Lecture 14: Instruction Scheduling

Page 2: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 2 6.035 ©MIT Fall 1998

Outline

• Modern architectures• Branch delay slots• Introduction to instruction scheduling• List scheduling• Resource constraints• Interaction with register allocation• Scheduling across basic blocks• Trace scheduling• Scheduling for loops• Loop unrolling• Software pipelining

5

Page 3: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 3 6.035 ©MIT Fall 1998

Simple Machine Model

• Instructions are executed in sequence– Fetch, decode, execute, store results– One instruction at a time

• For branch instructions, start fetching from a different location if needed– Check branch condition– Next instruction may come from a new location

given by the branch instruction

Page 4: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 4 6.035 ©MIT Fall 1998

Simple Execution Model

• 5 Stage pipe-line

• Fetch: get the next instruction• Decode: figure-out what that instruction is• Execute: Perform ALU operation

– address calculation in a memory op

• Memory: Do the memory access in a mem. Op.• Write Back: write the results back

fetch decode execute memory writeback

Page 5: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 5 6.035 ©MIT Fall 1998

Simple Execution Model

IF DE EXE MEM WB

IF DE EXE MEM WB

Inst 1

Inst 2

time

Page 6: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 6 6.035 ©MIT Fall 1998

Simple Execution Model

IF DE EXE MEM WB

IF DE EXE MEM WB

Inst 1

Inst 2

time

IF DE EXE MEM WB

IF DE EXE MEM WB

IF DE EXE MEM WB

IF DE EXE MEM WB

IF DE EXE MEM WB

Inst 1

Inst 2

Inst 3

Inst 4

Inst 5

Page 7: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 7 6.035 ©MIT Fall 1998

Outline

• Modern architectures• Branch delay slots• Introduction to instruction scheduling• List scheduling• Resource constraints• Interaction with register allocation• Scheduling across basic blocks• Trace scheduling• Scheduling for loops• Loop unrolling• Software pipelining

5

Page 8: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 8 6.035 ©MIT Fall 1998

Handling Cond. Branch Instructions

• Does not know the location of the next instruction until later– after DE in jump instructions– after EXE in branch instructions

Page 9: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 9 6.035 ©MIT Fall 1998

Handling Cond. Branch Instructions

• Does not know the location of the next instruction until later– after DE in jump instructions– after EXE in branch instructions

IF DE EXE MEM WBBranch

Page 10: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 10 6.035 ©MIT Fall 1998

Handling Cond. Branch Instructions

• Does not know the location of the next instruction until later– after DE in jump instructions– after EXE in branch instructions

IF DE EXE MEM WB

IF DE EXE MEM WB

Branch

Inst

Page 11: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 11 6.035 ©MIT Fall 1998

Handling Cond. Branch Instructions

• Does not know the location of the next instruction until later– after DE in jump instructions– after EXE in branch instructions

• What to do with the middle 2 instructions?

IF DE EXE MEM WB

IF DE EXE MEM WB

IF DE EXE MEM WB

IF DE EXE MEM WB

Branch

???

???

Inst

Page 12: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 12 6.035 ©MIT Fall 1998

Handling Cond. Branch Instructions

• What to do with the middle 2 instructions?

• Stall the pipeline in case of a branch until we know the address of the next instruction– wasted cycles

IF DE EXE MEM WB

IF DE EXE MEM WB

Branch

Next inst

Page 13: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 13 6.035 ©MIT Fall 1998

Handling Cond. Branch Instructions

• What to do with the middle 2 instructions?• Delay the action of the branch

– Make branch affect only after two instructions– Two instructions after the branch gets executed

regardless of the branch

IF DE EXE MEM WB

IF DE EXE MEM WB

IF DE EXE MEM WB

IF DE EXE MEM WB

Branch

Next seq inst

Next seq inst

Branch target inst

Page 14: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 14 6.035 ©MIT Fall 1998

Branch Delay Slot(s)

• MIPS has a branch delay slot– The instruction after the branch gets executed if the

branch gets executed– Fetching from the Branch target takes place only

after that

Page 15: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 15 6.035 ©MIT Fall 1998

Branch Delay Slot(s)

• MIPS has a branch delay slot– The instruction after the branch gets executed if the

branch gets executed– Fetching from the Branch target takes place only

after that

• Branch delay slotble r3, foo

Branch delay slot

Page 16: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 16 6.035 ©MIT Fall 1998

Branch Delay Slot(s)

• MIPS has a branch delay slot– The instruction after the branch gets executed if the

branch gets executed– Fetching from the Branch target takes place only

after that

• Branch delay slotble r3, foo

Branch delay slot

• What to put in the branch delay slot?

Page 17: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 17 6.035 ©MIT Fall 1998

Filling The Branch Delay Slot

• Simple Solution: Put a no-op

ble r3, foo

Branch delay slot

Page 18: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 18 6.035 ©MIT Fall 1998

Filling The Branch Delay Slot

• Simple Solution: Put a no-op

• Wasted instruction, just like a stall

ble r3, foo

noop Branch delay slot

Page 19: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 19 6.035 ©MIT Fall 1998

Filling The Branch Delay Slot

• Move an instruction from above the branch

ble r3, foo

Branch delay slot

Page 20: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 20 6.035 ©MIT Fall 1998

Filling The Branch Delay Slot

• Move an instruction from above the branch• moved instruction executes iff branch executes

– Get the instruction from the same basic block as the branch

– Don’t move a branch instruction!

• instruction need to be moved over the branch– branch does not depend on the result of the inst.

prev_instr

ble r3, foo

prev_instr Branch delay slot

Page 21: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 21 6.035 ©MIT Fall 1998

Filling The Branch Delay Slot

• Move an instruction dominated by the branch instruction

ble r3, foo

dom_instr Branch delay slot

foo:

dom_instr

Page 22: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 22 6.035 ©MIT Fall 1998

Filling The Branch Delay Slot

• Move an instruction from the target– Instruction dominated by target– No other ways to reach target (if so take care of them)– If conditional branch, instruction should not have a

lasting effect if the branch is not taken

ble r3, foo

instr Branch delay slot

foo:instr

Page 23: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 23 6.035 ©MIT Fall 1998

Load Delay Slots

• Results of the loads is not available until end of MEM stage

• If the value of the load is used…what to do??

IF DE EXE MEM WB

IF DE EXE MEM WB

Load

Use of load

Page 24: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 24 6.035 ©MIT Fall 1998

Load Delay SlotsIf the value of the load is used…what to do??• Always stall one cycle• Stall one cycle if next instruction uses the value

– Need hardware to do this

• Have a delay slot for load– The new value is only available after two instructions– If next inst. uses the register, it’ll get the old value

IF DE EXE MEM WB

IF DE EXE MEM WB

IF DE EXE MEM WB

Load

???

Use of load

Page 25: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 25 6.035 ©MIT Fall 1998

Example

r2 = *(r1 + 4)

r3 = *(r1 + 8)

r4 = r2 + r3

r5 = r2 - 1

goto L1

Page 26: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 26 6.035 ©MIT Fall 1998

Example

r2 = *(r1 + 4)noopr3 = *(r1 + 8)noopr4 = r2 + r3r5 = r2 - 1goto L1noop

• Assume 1 cycle delay on branches and 1 cycle latency for loads

Page 27: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 27 6.035 ©MIT Fall 1998

Example

r2 = *(r1 + 4)noopr3 = *(r1 + 8)noopr4 = r2 + r3r5 = r2 - 1goto L1noop

• Assume 1 cycle delay on branches and 1 cycle latency for loads

Page 28: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 28 6.035 ©MIT Fall 1998

Example

r2 = *(r1 + 4)

r3 = *(r1 + 8)noopr4 = r2 + r3r5 = r2 - 1goto L1noop

• Assume 1 cycle delay on branches and 1 cycle latency for loads

Page 29: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 29 6.035 ©MIT Fall 1998

Example

r2 = *(r1 + 4)r3 = *(r1 + 8)noopr4 = r2 + r3r5 = r2 - 1goto L1noop

• Assume 1 cycle delay on branches and 1 cycle latency for loads

Page 30: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 30 6.035 ©MIT Fall 1998

Example

r2 = *(r1 + 4)r3 = *(r1 + 8)noopr4 = r2 + r3r5 = r2 - 1goto L1noop

• Assume 1 cycle delay on branches and 1 cycle latency for loads

Page 31: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 31 6.035 ©MIT Fall 1998

Example

r2 = *(r1 + 4)r3 = *(r1 + 8)r5 = r2 - 1r4 = r2 + r3

goto L1noop

• Assume 1 cycle delay on branches and 1 cycle latency for loads

Page 32: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 32 6.035 ©MIT Fall 1998

Example

r2 = *(r1 + 4)r3 = *(r1 + 8)r5 = r2 - 1r4 = r2 + r3goto L1noop

• Assume 1 cycle delay on branches and 1 cycle latency for loads

Page 33: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 33 6.035 ©MIT Fall 1998

Example

r2 = *(r1 + 4)r3 = *(r1 + 8)r5 = r2 - 1r4 = r2 + r3goto L1noop

• Assume 1 cycle delay on branches and 1 cycle latency for loads

Page 34: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 34 6.035 ©MIT Fall 1998

Example

r2 = *(r1 + 4)r3 = *(r1 + 8)r5 = r2 - 1

goto L1r4 = r2 + r3

• Assume 1 cycle delay on branches and 1 cycle latency for loads

Page 35: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 35 6.035 ©MIT Fall 1998

Example

r2 = *(r1 + 4)r3 = *(r1 + 8)r5 = r2 - 1goto L1r4 = r2 + r3

• Assume 1 cycle delay on branches and 1 cycle latency for loads

Page 36: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 36 6.035 ©MIT Fall 1998

Outline

• Modern architectures• Branch delay slots• Introduction to instruction scheduling• List scheduling• Resource constraints• Interaction with register allocation• Scheduling across basic blocks• Trace scheduling• Scheduling for loops• Loop unrolling• Software pipelining

5

Page 37: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 37 6.035 ©MIT Fall 1998

From a Simple Machine Model to a Real Machine Model

• Many pipeline stages– MIPS R4000 has 8 stages

• Different instructions taking different amount of time to execute– mult 10 cycles– div 69 cycles– ddiv 133 cycles

• Hardware to stall the pipeline if an instruction uses a result that is not ready

Page 38: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 38 6.035 ©MIT Fall 1998

Real Machine Model cont.

• Most modern processors have multiple execution units (superscalar)– If the instruction sequence is correct, multiple

operations will happen in the same cycles– Even more important to have the right instruction

sequence

Page 39: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 39 6.035 ©MIT Fall 1998

Instruction Scheduling

• Reorder instructions so that pipeline stalls are minimized

Page 40: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 40 6.035 ©MIT Fall 1998

Constraints On Scheduling

• Data dependencies

• Control dependencies

• Resource Constraints

Page 41: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 41 6.035 ©MIT Fall 1998

Data Dependency between Instructions

• If two instructions access the same variable, they can be dependent

• Kind of dependencies – True: write read– Anti: read write– Output: write write

• What to do if two instructions are dependent.– The order of execution cannot be reversed – Reduce the possibilities for scheduling

Page 42: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 42 6.035 ©MIT Fall 1998

Computing Dependencies

• For basic blocks, compute dependencies by walking through the instructions

• Identifying register dependencies is simple– is it the same register?

• For memory accesses– simple: base + offset1 ?= base + offset2– data dependence analysis: a[2i] ?= a[2i+1]– interprocedural analysis: global ?= parameter– pointer alias analysis: p1 ?= p

Page 43: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 43 6.035 ©MIT Fall 1998

Representing Dependencies

• Using a dependence DAG, one per basic block

• Nodes are instructions, edges represent dependencies

Page 44: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 44 6.035 ©MIT Fall 1998

Representing Dependencies

• Using a dependence DAG, one per basic block

• Nodes are instructions, edges represent dependencies

1: r2 = *(r1 + 4)

2: r3 = *(r1 + 8)

3: r4 = r2 + r3

4: r5 = r2 - 1

Page 45: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 45 6.035 ©MIT Fall 1998

• Using a dependence DAG, one per basic block

• Nodes are instructions, edges represent dependencies

1: r2 = *(r1 + 4)

2: r3 = *(r1 + 8)

3: r4 = r2 + r3

4: r5 = r2 - 1 3

1

Representing Dependencies

2

4

Page 46: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 46 6.035 ©MIT Fall 1998

• Using a dependence DAG, one per basic block

• Nodes are instructions, edges represent dependencies

1: r2 = *(r1 + 4)

2: r3 = *(r1 + 8)

3: r4 = r2 + r3

4: r5 = r2 - 1

• Edge is labeled with Latency:– v(i j) = delay required between initiation times of i

and j minus the execution time required by i

3

1

Representing Dependencies

2

4

2 22

Page 47: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 47 6.035 ©MIT Fall 1998

Example

1: r2 = *(r1 + 4)

2: r3 = *(r2 + 4)

3: r4 = r2 + r3

4: r5 = r2 - 1

3

1 2

4

2 22

3

Page 48: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 48 6.035 ©MIT Fall 1998

Another Example

1: r2 = *(r1 + 4)

2: *(r1 + 4) = r3

3: r3 = r2 + r3

4: r5 = r2 - 1

3

1 2

4

Page 49: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 49 6.035 ©MIT Fall 1998

Another Example

1: r2 = *(r1 + 4)

2: *(r1 + 4) = r3

3: r3 = r2 + r3

4: r5 = r2 - 1

3

1 2

4

2 21

1

Page 50: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 50 6.035 ©MIT Fall 1998

Control Dependencies and Resource Constraints

• For now, lets only worry about basic blocks

• For now, lets look at simple pipelines

Page 51: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 51 6.035 ©MIT Fall 1998

Example

1: LA r1,array 2: LD r2,4(r1) 3: AND r3,r3,0x00FF 4: MULC r6,r6,100 5: ST r7,4(r6)6: DIVC r5,r5,1007: ADD r4,r2,r58: MUL r5,r2,r49: ST r4,0(r1)

Page 52: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 52 6.035 ©MIT Fall 1998

ExampleResults In

1: LA r1,array 1 cycle2: LD r2,4(r1) 1 cycle3: AND r3,r3,0x00FF 1 cycle4: MULC r6,r6,100 3 cycles5: ST r7,4(r6)6: DIVC r5,r5,100 4 cycles7: ADD r4,r2,r5 1 cycle8: MUL r5,r2,r4 3 cycles9: ST r4,0(r1)

Page 53: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 53 6.035 ©MIT Fall 1998

ExampleResults In

1: LA r1,array 1 cycle2: LD r2,4(r1) 1 cycle3: AND r3,r3,0x00FF 1 cycle4: MULC r6,r6,100 3 cycles5: ST r7,4(r6)6: DIVC r5,r5,100 4 cycles7: ADD r4,r2,r5 1 cycle8: MUL r5,r2,r4 3 cycles9: ST r4,0(r1)

1

Page 54: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 54 6.035 ©MIT Fall 1998

ExampleResults In

1: LA r1,array 1 cycle2: LD r2,4(r1) 1 cycle3: AND r3,r3,0x00FF 1 cycle4: MULC r6,r6,100 3 cycles5: ST r7,4(r6)6: DIVC r5,r5,100 4 cycles7: ADD r4,r2,r5 1 cycle8: MUL r5,r2,r4 3 cycles9: ST r4,0(r1)

1

Page 55: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 55 6.035 ©MIT Fall 1998

ExampleResults In

1: LA r1,array 1 cycle2: LD r2,4(r1) 1 cycle3: AND r3,r3,0x00FF 1 cycle4: MULC r6,r6,100 3 cycles5: ST r7,4(r6)6: DIVC r5,r5,100 4 cycles7: ADD r4,r2,r5 1 cycle8: MUL r5,r2,r4 3 cycles9: ST r4,0(r1)

1 2

Page 56: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 56 6.035 ©MIT Fall 1998

ExampleResults In

1: LA r1,array 1 cycle2: LD r2,4(r1) 1 cycle3: AND r3,r3,0x00FF 1 cycle4: MULC r6,r6,100 3 cycles5: ST r7,4(r6)6: DIVC r5,r5,100 4 cycles7: ADD r4,r2,r5 1 cycle8: MUL r5,r2,r4 3 cycles9: ST r4,0(r1)

1 2 3

Page 57: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 57 6.035 ©MIT Fall 1998

ExampleResults In

1: LA r1,array 1 cycle2: LD r2,4(r1) 1 cycle3: AND r3,r3,0x00FF 1 cycle4: MULC r6,r6,100 3 cycles5: ST r7,4(r6)6: DIVC r5,r5,100 4 cycles7: ADD r4,r2,r5 1 cycle8: MUL r5,r2,r4 3 cycles9: ST r4,0(r1)

1 2 3 4

Page 58: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 58 6.035 ©MIT Fall 1998

ExampleResults In

1: LA r1,array 1 cycle2: LD r2,4(r1) 1 cycle3: AND r3,r3,0x00FF 1 cycle4: MULC r6,r6,100 3 cycles5: ST r7,4(r6)6: DIVC r5,r5,100 4 cycles7: ADD r4,r2,r5 1 cycle8: MUL r5,r2,r4 3 cycles9: ST r4,0(r1)

1 2 3 4

Page 59: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 59 6.035 ©MIT Fall 1998

ExampleResults In

1: LA r1,array 1 cycle2: LD r2,4(r1) 1 cycle3: AND r3,r3,0x00FF 1 cycle4: MULC r6,r6,100 3 cycles5: ST r7,4(r6)6: DIVC r5,r5,100 4 cycles7: ADD r4,r2,r5 1 cycle8: MUL r5,r2,r4 3 cycles9: ST r4,0(r1)

1 2 3 4 st st 5

Page 60: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 60 6.035 ©MIT Fall 1998

ExampleResults In

1: LA r1,array 1 cycle2: LD r2,4(r1) 1 cycle3: AND r3,r3,0x00FF 1 cycle4: MULC r6,r6,100 3 cycles5: ST r7,4(r6)6: DIVC r5,r5,100 4 cycles7: ADD r4,r2,r5 1 cycle8: MUL r5,r2,r4 3 cycles9: ST r4,0(r1)

1 2 3 4 st st 5 6

Page 61: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 61 6.035 ©MIT Fall 1998

ExampleResults In

1: LA r1,array 1 cycle2: LD r2,4(r1) 1 cycle3: AND r3,r3,0x00FF 1 cycle4: MULC r6,r6,100 3 cycles5: ST r7,4(r6)6: DIVC r5,r5,100 4 cycles7: ADD r4,r2,r5 1 cycle8: MUL r5,r2,r4 3 cycles9: ST r4,0(r1)

1 2 3 4 st st 5 6

Page 62: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 62 6.035 ©MIT Fall 1998

ExampleResults In

1: LA r1,array 1 cycle2: LD r2,4(r1) 1 cycle3: AND r3,r3,0x00FF 1 cycle4: MULC r6,r6,100 3 cycles5: ST r7,4(r6)6: DIVC r5,r5,100 4 cycles7: ADD r4,r2,r5 1 cycle8: MUL r5,r2,r4 3 cycles9: ST r4,0(r1)

1 2 3 4 st st 5 6 st st st 7

Page 63: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 63 6.035 ©MIT Fall 1998

ExampleResults In

1: LA r1,array 1 cycle2: LD r2,4(r1) 1 cycle3: AND r3,r3,0x00FF 1 cycle4: MULC r6,r6,100 3 cycles5: ST r7,4(r6)6: DIVC r5,r5,100 4 cycles7: ADD r4,r2,r5 1 cycle8: MUL r5,r2,r4 3 cycles9: ST r4,0(r1)

1 2 3 4 st st 5 6 st st st 7

Page 64: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 64 6.035 ©MIT Fall 1998

ExampleResults In

1: LA r1,array 1 cycle2: LD r2,4(r1) 1 cycle3: AND r3,r3,0x00FF 1 cycle4: MULC r6,r6,100 3 cycles5: ST r7,4(r6)6: DIVC r5,r5,100 4 cycles7: ADD r4,r2,r5 1 cycle8: MUL r5,r2,r4 3 cycles9: ST r4,0(r1)

1 2 3 4 st st 5 6 st st st 7 8

Page 65: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 65 6.035 ©MIT Fall 1998

ExampleResults In

1: LA r1,array 1 cycle2: LD r2,4(r1) 1 cycle3: AND r3,r3,0x00FF 1 cycle4: MULC r6,r6,100 3 cycles5: ST r7,4(r6)6: DIVC r5,r5,100 4 cycles7: ADD r4,r2,r5 1 cycle8: MUL r5,r2,r4 3 cycles9: ST r4,0(r1)

1 2 3 4 st st 5 6 st st st 7 8 9

Page 66: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 66 6.035 ©MIT Fall 1998

ExampleResults In

1: LA r1,array 1 cycle2: LD r2,4(r1) 1 cycle3: AND r3,r3,0x00FF 1 cycle4: MULC r6,r6,100 3 cycles5: ST r7,4(r6)6: DIVC r5,r5,100 4 cycles7: ADD r4,r2,r5 1 cycle8: MUL r5,r2,r4 3 cycles9: ST r4,0(r1)

1 2 3 4 st st 5 6 st st st 7 8 9

14 cycles!

Page 67: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 67 6.035 ©MIT Fall 1998

Outline

• Modern architectures• Branch delay slots• Introduction to instruction scheduling• List scheduling• Resource constraints• Interaction with register allocation• Scheduling across basic blocks• Trace scheduling• Scheduling for loops• Loop unrolling• Software pipelining

5

Page 68: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 68 6.035 ©MIT Fall 1998

List Scheduling Algorithm

• Idea– Do a topological sort of the dependence DAG– Consider when an instruction can be scheduled

without causing a stall– Schedule the instruction if it causes no stall and all

its predecessors are already scheduled

• Optimal list scheduling is NP-complete– Use heuristics when necessary

Page 69: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 69 6.035 ©MIT Fall 1998

List Scheduling Algorithm

• Create a dependence DAG of a basic block

• Topological SortREADY = nodes with no predecessors

Loop until READY is empty

Schedule each node in READY when no stalling

Update READY

Page 70: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 70 6.035 ©MIT Fall 1998

Heuristics for selection

• Heuristics for selecting from the READY list– pick the node with the longest path to a leaf in the

dependence graph– pick a node with most immediate successors – pick a node that can go to a less busy pipeline (in a

superscalar)

Page 71: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 71 6.035 ©MIT Fall 1998

Heuristics for selection

• pick the node with the longest path to a leaf in the dependence graph

• Algorithm (for node x)– If no successors dx = 0

– dx = MAX( dy + cxy) for all successors y of x

– reverse breadth-first visitation order

Page 72: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 72 6.035 ©MIT Fall 1998

Heuristics for selection

• pick a node with most immediate successors

• Algorithm (for node x):– fx = number of successors of x

Page 73: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 73 6.035 ©MIT Fall 1998

ExampleResults In

1: LA r1,array 1 cycle2: LD r2,4(r1) 1 cycle3: AND r3,r3,0x00FF 1 cycle4: MULC r6,r6,100 3 cycles5: ST r7,4(r6)6: DIVC r5,r5,100 4 cycles7: ADD r4,r2,r5 1 cycle8: MUL r5,r2,r4 3 cycles9: ST r4,0(r1)

Page 74: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 74 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

31: LA r1,array2: LD r2,4(r1)3: AND r3,r3,0x00FF4: MULC r6,r6,1005: ST r7,4(r6)6: DIVC r5,r5,1007: ADD r4,r2,r58: MUL r5,r2,r49: ST r4,0(r1)

Page 75: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 75 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

Page 76: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 76 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0

Page 77: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 77 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

Page 78: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 78 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

Page 79: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 79 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7

Page 80: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 80 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

Page 81: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 81 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5

Page 82: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 82 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

Page 83: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 83 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { }

Page 84: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 84 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { }1, 3, 4, 6

Page 85: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 85 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 6, 1, 4, 3 }1, 3, 4, 6

Page 86: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 86 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 6, 1, 4, 3 }

Page 87: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 87 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 6, 1, 4, 3 }

6

Page 88: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 88 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 1, 4, 3 }

6

Page 89: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 89 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 1, 4, 3 }

6

Page 90: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 90 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 1, 4, 3 }

6 1

Page 91: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 91 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 4, 3 }

6 1

2

Page 92: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 92 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 2, 4, 3 }

6 1

Page 93: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 93 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 2, 4, 3 }

6 1

Page 94: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 94 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 2, 4, 3 }

6 1

Page 95: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 95 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 2, 4, 3 }

6 1 2

Page 96: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 96 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 4, 3 }

6 1 2

7

Page 97: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 97 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 7, 4, 3 }

6 1 2

Page 98: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 98 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 7, 4, 3 }

6 1 2

Page 99: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 99 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 7, 4, 3 }

6 1 2

Page 100: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 100 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 7, 4, 3 }

6 1 2

Page 101: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 101 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 7, 4, 3 }

6 1 2 4

Page 102: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 102 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 7, 3 }

6 1 2 4

5

Page 103: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 103 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 7, 3, 5 }

6 1 2 4

Page 104: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 104 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 7, 3, 5 }

6 1 2 4

Page 105: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 105 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 7, 3, 5 }

6 1 2 4

Page 106: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 106 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 7, 3, 5 }

6 1 2 4 7

Page 107: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 107 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 3, 5 }

6 1 2 4 7

8, 9

Page 108: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 108 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 3, 5, 8, 9 }

6 1 2 4 7

Page 109: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 109 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 3, 5, 8, 9 }

6 1 2 4 7

Page 110: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 110 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 3, 5, 8, 9 }

6 1 2 4 7 3

Page 111: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 111 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 5, 8, 9 }

6 1 2 4 7 3

Page 112: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 112 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 5, 8, 9 }

6 1 2 4 7 3

Page 113: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 113 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 5, 8, 9 }

6 1 2 4 7 3

Page 114: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 114 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 5, 8, 9 }

6 1 2 4 7 3 5

Page 115: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 115 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 8, 9 }

6 1 2 4 7 3 5

Page 116: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 116 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 8, 9 }

6 1 2 4 7 3 5

Page 117: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 117 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 8, 9 }

6 1 2 4 7 3 5

Page 118: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 118 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 8, 9 }

6 1 2 4 7 3 5 8

Page 119: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 119 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 9 }

6 1 2 4 7 3 5 8

Page 120: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 120 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 9 }

6 1 2 4 7 3 5 8

Page 121: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 121 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 9 }

6 1 2 4 7 3 5 8

Page 122: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 122 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { 9 }

6 1 2 4 7 3 5 8 9

Page 123: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 123 6.035 ©MIT Fall 1998

Example1

6

8

2

7

9

1

1

3

4

1

4

5

3

3

d=0 d=0

d=0

d=0 d=3

d=3

d=7d=4

d=5f=1f=0

f=0f=1f=1

f=1

f=2

f=0 f=0

READY = { }

6 1 2 4 7 3 5 8 9

Page 124: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 124 6.035 ©MIT Fall 1998

Example

6 1 2 4 7 3 5 8 9

Results In1: LA r1,array 1 cycle2: LD r2,4(r1) 1 cycle3: AND r3,r3,0x00FF 1 cycle4: MULC r6,r6,100 3 cycles5: ST r7,4(r6)6: DIVC r5,r5,100 4 cycles7: ADD r4,r2,r5 1 cycle8: MUL r5,r2,r4 3 cycles9: ST r4,0(r1)

9 cycles

Page 125: Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe 26.035 ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to

Saman Amarasinghe 125 6.035 ©MIT Fall 1998

ExampleResults In

1: LA r1,array 1 cycle2: LD r2,4(r1) 1 cycle3: AND r3,r3,0x00FF 1 cycle4: MULC r6,r6,100 3 cycles5: ST r7,4(r6)6: DIVC r5,r5,100 4 cycles7: ADD r4,r2,r5 1 cycle8: MUL r5,r2,r4 3 cycles9: ST r4,0(r1)

1 2 3 4 st st 5 6 st st st 7 8 914 cycles

Vs9 cycles

6 1 2 4 7 3 5 8 9