lecture 8: data-capture instruction schedulers. the goal is to execute instructions in dataflow...

45
Advanced Microarchitecture Lecture 8: Data-Capture Instruction Schedulers

Upload: nathalie-scarbro

Post on 29-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

Advanced MicroarchitectureLecture 8: Data-Capture Instruction Schedulers

Page 2: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

2

Out-of-Order Execution• The goal is to execute instructions in

dataflow order as opposed to the sequential order specified by the ISA

• The renamer plays a critical role by removing all of the false register dependencies

• The scheduler is responsible for:– for each instruction, detecting when all

dependencies have been satisifed (and therefore ready to execute)

– propagating readiness information between instructions

Lecture 8: Data-Capture Instruction Schedulers

Page 3: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

3

Out-of-Order Execution (2)

Lecture 8: Data-Capture Instruction Schedulers

StaticProgram

Fetc

hDynamic

InstructionStream

Renam

e

RenamedInstruction

Stream

Sch

edule

DynamicallyScheduled

Instructions

Out-of-order =out of the originalsequential order

Page 4: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

4

Superscalar != Out-of-Order

Lecture 8: Data-Capture Instruction Schedulers

A: R1 = Load 16[R2]B: R3 = R1 + R4C: R6 = Load 8[R9]D: R5 = R2 – 4E: R7 = Load 20[R5]F: R4 = R4 – 1G: BEQ R4, #0

C

D

E

cach

e m

iss

B

C

D

E

F

G

10 cycles

B

F

G

7 cycles

A

B

C D

E

F

G

C

D E

F

G

B

5 cycles

B C

D

E F

G

8 cycles

A

cach

e m

iss

1-wideIn-Order

A

cach

e m

iss

2-wideIn-Order

A

1-wideOut-of-Order

A

cach

e m

iss

2-wideOut-of-Order

Page 5: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

5

Data-Capture Scheduler• At dispatch, instruction

read all available operands from the register files and store a copy in the scheduler

• Unavailable operands will be “captured” from the functional unit outputs

• When ready, instructions can issue directly from the scheduler without reading additional operands from any other register files

Lecture 8: Data-Capture Instruction Schedulers

Fetch &Dispatch

ARF PRF/ROB

Data-CaptureScheduler

FunctionalUnits

Physica

l reg

ister u

pdate

Bypass

Page 6: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

6

Non-Data-Capture Scheduler

Lecture 8: Data-Capture Instruction Schedulers

Fetch &Dispatch

ARF PRF

Scheduler

FunctionalUnits

Physica

l reg

ister

up

date

More on thisnext lecture!

Page 7: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

7

Components of a Scheduler

Lecture 8: Data-Capture Instruction Schedulers

A B

C

D

F

E

G

Buffer for unexecutedinstructions

Method for trackingstate of dependencies(resolved or not)

Arbiter BMethod for choosing between multiple ready instructions competing for the same resource

Method for notificationof dependency resolution

“Scheduler Entries” or“Issue Queue” (IQ) or“Reservation Stations” (RS)

Page 8: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

8

Lather, Rinse, Repeat…• Scheduling Loop or Wakeup-Select Loop

– Wake-Up Part:• Instructions selected for execution notify dependents

(children) that the dependency has been resolved• For each instruction, check whether all input

dependencies have been resolved– if so, the instruction is “woken up”

– Select Part:• Choose which instructions get to execute

– If 3 add instructions are ready at the same time, but there are only two adders, someone must wait to try again in the next cycle (and again, and again until selected)

Lecture 8: Data-Capture Instruction Schedulers

Page 9: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

9

Scalar Scheduler (Issue Width = 1)

Lecture 8: Data-Capture Instruction Schedulers

T14T16

T39T6

T17T39

T15T39

=

=

=

=

=

=

=

=

T39

T8

T17

T42

Sele

ct Log

ic

To E

xecu

te Lo

gic

Tag B

roadca

st Bus

Page 10: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

10

Tags,ReadyLogic

SelectLogic

Superscalar Scheduler (detail of one entry)

Lecture 8: Data-Capture Instruction Schedulers

TagBroadcast

Buses

====

====

bid

grants

SrcL RdyLValL IssuedSrcL RdyLValL Dst

Page 11: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

11

Interaction with Execution

Lecture 8: Data-Capture Instruction Schedulers

A

Sele

ct Log

ic

SRD SL opcode ValL ValR

ValL ValR

ValL ValRValL ValR

Payload RAM

Page 12: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

12

Again, But Superscalar

Lecture 8: Data-Capture Instruction Schedulers

A

B

Sele

ct Log

ic

SR

SR

D

D

SL

SL

opcode ValL ValR

ValL ValR

ValL ValRValL ValR

opcode ValL ValR

ValL ValR

ValL ValR

The schedulercaptures thedata, hence “Data-Capture”

Page 13: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

13

Issue Width• Maximum number of instructions selected

for execution each cycle is the issue width– Previous slide showed an issue width of two– The slide before that showed the details of a

scheduler entry for an issue width of four• Hardware requirements:

– Typically, an Issue Width of N requires N tag broadcast buses

– Not always true: can specialize such that, for example, one “issue slot” can only handle branches

Lecture 8: Data-Capture Instruction Schedulers

Page 14: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

14

Pipeline Timing

Lecture 8: Data-Capture Instruction Schedulers

Select Payload

Wakeup

A: Execute

CaptureB:

tag broadcastresultbroadcast

enablecapture

on tag match

Select Payload Execute

WakeupC:enablecapture

tag broadcast

Cycle i Cycle i+1

A

B

C

Page 15: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

15

Pipelined Timing

Lecture 8: Data-Capture Instruction Schedulers

Select PayloadA: Execute

CaptureB:

tag broadcastresultbroadcast

enablecapture

Select Payload Execute

CaptureC:enablecapture

tag broadcast

Cycle i Cycle i+1

Select Payload Execute

Cycle i+2 Cycle i+3

Wakeup

Wakeup

A

B

CCan’t read and writepayload RAM at the

same time; may needto bypass the results

Page 16: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

16

Pipelined Timing (2)

• Previous slide placed the pipeline boundary at the writing of the ready bits

• This slide shows a pipeline where latches are placed right before the tag broadcast

Lecture 8: Data-Capture Instruction Schedulers

Select PayloadA: Execute

CaptureB:

tag broadcastresultbroadcast

enablecapture

Select Payload Execute

Cycle i Cycle i+1 Cycle i+2

Wakeup

A

B

Wakeup

Page 17: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

17

More Pipelined Timing

Lecture 8: Data-Capture Instruction Schedulers

Select PayloadA: Execute

CaptureB:tag broadcast

result broadcastand bypass

enablecapture

C:

Cycle i

Wakeup

Select

Wakeup

Payload Execute

Select Payload Exec

Capture

Cycle i+1 Cycle i+2 Cycle i+3

CaptureWakeuptag match

on firstoperand tag match

on secondoperand

(now C is ready)No simultaneous

read/write!

A

B

C

Need a secondlevel of bypassing

Page 18: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

18

More-er Pipelined Timing

Lecture 8: Data-Capture Instruction Schedulers

Select PayloadA: Execute

CaptureC:

Cycle i

Wakeup

i+1 i+2 i+3

Select Payload Execute

Wakeup Capture

Select Payload Ex

i+4 i+5

D:

A

C

B

D

Wakeup Capture

B: Select Select Payload Execute

A&B bothready, onlyA selected,B bids again

AC and CD mustbe bypassed, butno bypass for BD

Dependent instructionscannot execute in

back-to-back cycles!

Page 19: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

19

Critical Loops• Wakeup-Select Loop cannot be trivially

pipelined while maintaining back-to-back execution of dependent instructions

Lecture 8: Data-Capture Instruction Schedulers

A

B

C

A

B

C

RegularScheduling

No Back-to-Back• Worst-case IPC reduction by ½

• Shouldn’t be that bad (previous slide had IPC of 4/3)

• Studies indicate 10-15% IPC penalty

• “Loose Loops Sink Chips”, Borch et al.

Page 20: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

20

IPC vs. Frequency• 10-15% IPC drop doesn’t seem bad if we

can double the clock frequency

Lecture 8: Data-Capture Instruction Schedulers

1000ps 500ps500ps2.0 IPC, 1GHz 1.7 IPC, 2GHz

2 BIPS 3.4 BIPS• Frequency doesn’t double

– latch/pipeline overhead– unbalanced stages

• Other sources of IPC penalties– branches: ↑ pipe depth, ↓ predictor size, ↑ predict-to-

update latency– caches/memory: same time in seconds, ↑ frequency,

more cycles• Power limitations: more logic, higher frequency

P=½CV2f

900ps 450ps 450ps

900ps 350 550

1.5GHz

Page 21: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

21

Select Logic• Goal: minimize DFG height (execution time)• NP-Hard

– Precedence Constrained Scheduling Problem– Even harder because the entire DFG is not

known at scheduling time• Scheduling decisions made now may affect the

scheduling of instructions not even fetched yet

• Heuristics– For performance– For ease of implementation

Lecture 8: Data-Capture Instruction Schedulers

Page 22: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

22

Simple Select Logic

Lecture 8: Data-Capture Instruction Schedulers

Scheduler Entries

1

S entriesyields O(S)gate delay

Grant0 = 1Grant1 = !Bid0

Grant2 = !Bid0 & !Bid1

Grant3 = !Bid0 & !Bid1 & !Bid2

Grantn-1 = !Bid0 & … & !Bidn-21

x0

x1

x2

x3

x4

x5

x6

x7

x8

grant0

xi = Bidi

granti

grant1

grant2

grant3

grant4

grant5

grant6

grant7

grant8

grant9

O(log S) gates

Page 23: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

23

Simple Select Logic• Instructions may be located in scheduler

entries in no particular order– The first ready entry may be the oldest,

youngest, anywhere in between– Simple select results in a “random” schedule

• the schedule is still “correct” in that no dependencies are violated

• it just may be far from optimal

Lecture 8: Data-Capture Instruction Schedulers

Page 24: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

24

Oldest First Select• Intuition:

– An instruction that has just entered the scheduler will likely have few, if any, dependents (only intra-group)

– Similarly, the instructions that have been in the scheduler the longest (the oldest) are likely to have the most dependents

– Selecting the oldest instructions has a higher chance of satisfying more dependencies, thus making more instructions ready to execute more parallelism

Lecture 8: Data-Capture Instruction Schedulers

Page 25: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

25

Implementing Oldest First Select

Lecture 8: Data-Capture Instruction Schedulers

BA

CDEF

H

Write instructionsinto scheduler inprogram order

G

EF

HG

Compress Up

KL

EF

HG

IJ

Newlydispatched

IJ

BD

Page 26: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

26

Implementing Oldest First Select (2)• Compressing buffers are very complex

– gates, wiring, area, power

Lecture 8: Data-Capture Instruction Schedulers

Ex. 4-wideNeed up toshift by 4

An entireinstruction’s

worth ofdata: tags,opcodes,

immediates,readiness, etc.

Page 27: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

27

Implementing Oldest First Select (3)

Lecture 8: Data-Capture Instruction Schedulers

B

C

A

D

E

F

H

G

4

6053172

0

3

22

0

0

Age-Aware Select Logic

Grant

Page 28: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

28

Handling Multi-Cycle Instructions

Lecture 8: Data-Capture Instruction Schedulers

Sched PayLd Exec

Sched PayLd Exec

Add R1 = R2 + R3

Xor R4 = R1 ^ R5

Sched PayLd Exec Add R4 = R1 + R5

Sched PayLd Exec Mul R1 = R2 × R3Exec Exec

Add attemps to execute too early!Result not ready for another two cycles.

Page 29: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

29

Delayed Tag Broadcast

• It works, but…– Must make sure tag broadcast bus will be

available N cycles in the future when needed– Bypass, data-capture potentially get more

complex

Lecture 8: Data-Capture Instruction Schedulers

Sched PayLd Exec Add R4 = R1 + R5

Sched PayLd Exec Mul R1 = R2 × R3Exec Exec

Assume pipelined such that tagbroadcast occurs at cycle boundary

Page 30: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

30

Delayed Tag Broadcast (2)

Lecture 8: Data-Capture Instruction Schedulers

Sched PayLd Exec Add R4 = R1 + R5

Sched PayLd Exec Mul R1 = R2 × R3Exec Exec

Sched PayLd Exec Sub R7 = R8 – #1

Sched PayLd Exec Xor R9 = R9 ^ R6

Assumeissue width

equals 2

In this cycle, three instructionsneed to broadcast their tags!

Page 31: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

31

Delayed Tag Broadcast (3)• Possible solutions

1. Have one select for issuing, another select for tag broadcast• messes up timing of data-capture

2. Pre-reserve the bus• select logic more complicated, must track usage in

future cycles in addition to the current cycle

3. Hold the issue slot from initial launch until tag broadcast

Lecture 8: Data-Capture Instruction Schedulers

sch payl exec exec exec

Issue width effectively reduced by one for three cycles

Page 32: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

32

Delayed Wakeup• Push the delay to the consumer

Lecture 8: Data-Capture Instruction Schedulers

=

Tag Broadcast forR1 = R2 × R3

R1

=

R4

R5 = R1 + R4

ready!

Tag arrives, but we waitthree cycles beforeacknowledging it

Also need to know parent’s latency

Page 33: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

33

Non-Deterministic Latencies• Problem with previous approaches is that

they assume that all instruction latencies are known at the time of scheduling– Makes things uglier for the delayed broadcast– This pretty much kills the delayed wakeup

approach• Examples

– Load instructions• Latency {L1_lat, L2_lat, L3_lat, DRAM_lat}

– DRAM_lat is not a constant either, queuing delays– Some architecture specific cases

• PowerPC 603 has a “early out” for multiplication with a low-bit-width multiplicand

• Intel Core 2’s divider also has an early out

Lecture 8: Data-Capture Instruction Schedulers

Page 34: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

34

The Wait-and-See Approach• Just wait and see whether a load hits or

misses in the cache

Lecture 8: Data-Capture Instruction Schedulers

Sched PayLd Exec

R2 = R1 + #4Sched PayLd Exec

R1 = 16[$sp]

Exec Exec Cache hit known,can broadcast tag

Load-to-Use latencyincreases by 2 cycles(3 cycle load appears as 5)

Scheduler

DL1 Tags

DL1Data

May be able todesign cache s.t.hit/miss knownbefore data

Exec Exec Exec

Sched PayLd Exec

Penalty reduced to 1 cycle

Page 35: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

35

Load-Hit Speculation• Caches work pretty well

– hit rates are high, otherwise caches wouldn’t be too useful

– Just assume all loads will hit in the cache

Lecture 8: Data-Capture Instruction Schedulers

Sched PayLd Exec R2 = R1 + #4

Sched PayLd Exec R1 = 16[$sp]Exec Exec Cache hit,data forwardedBroadcast delayed

by DL1 latency

• Um, ok, what happens when there’s a load miss?

Page 36: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

36

Load Miss Scenario

Lecture 8: Data-Capture Instruction Schedulers

Sched PayLd Exec

Sched PayLd Exec Exec ExecBroadcast delayedby DL1 latency

Exec … Exec

Cache Miss Detected!

Value at cache output is bogus

Invalidate the instruction(ALU output ignored)

Sched PayLd Exec

Rescheduled assuminga hit at the DL2 cache

There could be a miss at the L2, and again at the L3 cache. A single load can waste multiple issuing opportunities.

Each mis-scheduling wastes an issue slot:the tag broadcast bus, payload RAM read port, writeback/bypass bus, etc. could have been used for another instruction

Broadcast delayed by L2 latency

L2 hit

Page 37: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

37

Scheduler Deallocation• Normally, as soon as an instruction issues,

it can vacate its scheduler entry– The sooner an entry is deallocated, the sooner

another instruction can reuse that entry leads to that instruction executing earlier

• In the case of a load, the load must hang around in the scheduler until it can be guaranteed that it will not have to rebroadcast its destination tag– Decreases the effective size of the scheduler

Lecture 8: Data-Capture Instruction Schedulers

Page 38: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

38

“But wait, there’s more!”

Lecture 8: Data-Capture Instruction Schedulers

Sched PayLd Exec

Sched PayLd Exec Exec Exec

DL1 Miss

Sched PayLd Exec

Sched PayLd Exec

Sched PayLd Exec

Squash

Not only children getsquashed, there may be grand-children to squash as well

All waste issue slotsAll must be rescheduledAll waste powerNone may leave scheduler until load hit known

Sched PayLd Ex

Sched PayLd Ex

Sched PayLd Ex

Sched PayLd Exec

Page 39: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

39

Squashing• The number of cycles worth of dependents

that must be squashed is equal to Cache-Miss-Known latency minus one– previous example, miss-known latency = 3

cycles, there are two cycles worth of mis-scheduled dependents

– Early miss-detection helps reduce mis-scheduled instructions

– A load may have many children, but the issue width limits how many can possibly be mis-scheduled• Max = Issue-width × (Miss-known-lat – 1)

Lecture 8: Data-Capture Instruction Schedulers

Page 40: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

40

Squashing (2)• Simple: squash anything “in-flight”

between schedule and execute

Lecture 8: Data-Capture Instruction Schedulers

SchedPayLd Exec Exec Exec

SchedPayLd Exec

SchedPayLd Exec

SchedPayLd Exec

SchedPayLd Exec

SchedPayLd Exec

SchedPayLd Exec

SchedPayLd Exec

• This may include non-dependent instructions

• All instructions must stay in the scheduler for a few extra cycles to make sure they will not be rescheduled due to a squash

Page 41: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

41

Squashing (3)• Selective squashing: use “load colors”

– each load is assigned a unique color– every dependent “inherits” its parents’ colors– on a load miss, the load broadcasts its color and

anyone in the same color group gets squashed• An instruction may end up with many

colors

Lecture 8: Data-Capture Instruction Schedulers

… …Explicitly tracking each color wouldrequire a huge number of comparisons

Page 42: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

42

Squashing (4)• Can list “colors” in unary (bit-vector) form

Lecture 8: Data-Capture Instruction Schedulers

Load R1 = 16[R2]1 0 0 0 0 0 0 0

Add R3 = R1 + R41 0 0 0 0 0 0 0

Load R5 = 12[R7]1 0 0 0 0 0 00

Load R8 = 0[R1]1 0 1 0 0 0 0 0

Load R7 = 8[R4]100 0 0 0 00

Add R6 = R8 + R71 0 1 0 0 0 01

• Each instruction’s color vector is the bitwise OR of its parents’ vectors

• A load miss now only squashes the dependent instructions

• Hardware cost increases quadratically with number of load instructions

X

X

X

Page 43: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

43

Allocation• Allocate in-order,

Deallocate in-order• Very simple!• Smaller effective

scheduler size– instructions may have

already executed out-of-order, but their RS entries cannot be reused

– Can be very bad if a load goes to main memory

Lecture 8: Data-Capture Instruction Schedulers

Circular Buffer

Head

Tail

Tail

Page 44: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

44

Allocation (2)• With arbitrary

placement, entries much better utilized

• Allocator more complex– must scan availability and

find N free entries• Write logic more

complex– must route N instructions

to arbitrary entries of the scheduler

Lecture 8: Data-Capture Instruction Schedulers

RS A

lloca

tor

Entry availabilitybit-vector

0000100100000111

Page 45: Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by

45

Allocation (3)• Segment the entries

– only one entry per segment may be allocated per cycle

– instead of 4-of-16 alloc (previous slide), each allocator only does 1-of-4

– write logic simplified as well

• Still possible inefficiencies– full segment leads to

allocating less than N instructions per cycleLecture 8: Data-Capture Instruction Schedulers

0010101000010110

Allo

cA

lloc

Allo

cA

lloc

A

B

C

DX

Free RS entries exist, justnot in the correct segment

0