static ilp static (compiler based) scheduling

Post on 24-Jan-2016

55 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Static ILP Static (Compiler Based) Scheduling. Σημειώσεις UW-Madison Διαβάστε κεφ. 4 βιβλίο, και Paper on Itanium στην ιστοσελίδα. Today’s Theme and Contents. Let compiler uncover the ILP Objective:more ilp/simpler hardware/faster clock/less power How: Static Scheduling Loop Unrolling - PowerPoint PPT Presentation

TRANSCRIPT

StaticILP.12/12/02

Static ILP Static (Compiler Based) Scheduling

• Σημειώσεις UW-Madison• Διαβάστε κεφ. 4 βιβλίο, και• Paper on Itanium στην ιστοσελίδα

StaticILP.22/12/02

Today’s Theme and Contents

• Let compiler uncover the ILP– Objective:more ilp/simpler hardware/faster clock/less power

• How:– Static Scheduling– Loop Unrolling– software pipelining,– Static Multiple Issue: VLIW

» local, global scheduling» static branch prediction» software speculation: trace scheduling, superblocks» nops, lockstep» conditional moves,predication» speculative loads

• IA-64 and Itanium

StaticILP.32/12/02

Basic Idea

• The compiler moves dependent instructions apart to avoid hazards

• This means:– such instructions exist (if not there employ

transformations)– the compiler knows implementation details

» latency AND superscalarity (issue width)

• What happens if implementation changes?

• Static ILP applicable to statically and dynamically scheduled processors

• Statically scheduled processors: the compiler dictates which instructions can execute together (scheduling done in software)

StaticILP.42/12/02

(Local Scheduling)

StaticILP.52/12/02

(Local Scheduling)

StaticILP.62/12/02

StaticILP.72/12/02

StaticILP.82/12/02

StaticILP.92/12/02

StaticILP.102/12/02

StaticILP.112/12/02

StaticILP.122/12/02

StaticILP.132/12/02

StaticILP.142/12/02

StaticILP.152/12/02

(useful for large iteration counts)

StaticILP.162/12/02

Software speculation/Global Scheduling

StaticILP.172/12/02

StaticILP.182/12/02

HOW??

Static prediction, profile, frequency, pathWhich is better the above or dynamic prediction

StaticILP.192/12/02

StaticILP.202/12/02

Register pressure

StaticILP.212/12/02

Superblocking: overcomes some of the complexities of trace schedulingsingle vs multiple entry

StaticILP.222/12/02

StaticILP.232/12/02

StaticILP.242/12/02

StaticILP.252/12/02

Does noy have

StaticILP.262/12/02

StaticILP.272/12/02

StaticILP.282/12/02

PentiumIV +3GHz vs Itanium 1GHz

StaticILP.292/12/02

LockStep: any hazard stall / NOPs if not enough //ism

StaticILP.302/12/02

StaticILP.312/12/02

Predicated Execution &Conditional Moves

Convert control dependences to data dependences

if (a=0) s=t;R1 R2 R3

bnez R1,Laddu R2,R3,0

L:

cmovz R2,R3,R1

Above for all itypes is called predication…

+/-?

StaticILP.322/12/02

Speculative Loads

Bypass stores speculative - repair code in case ofmispeculationUse an address buffer

1. LookUp Table: updated by address of speculative load

2. Updated by addresses of intervening stores

3. Check instruction that no store conflicted and release

entry

StaticILP.332/12/02

StaticILP.342/12/02

StaticILP.352/12/02

StaticILP.362/12/02

StaticILP.372/12/02

StaticILP.382/12/02

Let the compiler do the work

• All• Most of it• As long as it improves performance• …

StaticILP.392/12/02

by Harsh Sharangpani and Ken Arora

see web page

StaticILP.402/12/02

StaticILP.412/12/02

IdeaCompiler has

larger instruction

window than hardware.

Communicateto the hardware

more of the information gleaned at

compile time.

StaticILP.422/12/02

Six instructions wide and ten stage deepTries to minimize latency of most frequent operations

Hardware support for compilation time indeterminacies

StaticILP.432/12/02

Software initiated prefetch (requests filtered by instruction cache)prefetch must be 12 cycles before branch to hide latencyL2 -> streaming buffer -> instruction cache

Four level branch predictor hierarchy to prevent 9-cycle pipeline stall Decoupling buffer hold up to 8 bundles of code (bundle?)

StaticILP.442/12/02

Conclusion/Future

• Compiler can do a lot of the work but need hardware assitance

• Currently in pursue of best of both worlds

• Future:– How long IA-32 will last --- and will IA-64 take over IA32

market?– Will IA64 be the only ISA in the world?

top related