gennette gill montek singh bottleneck analysis and alleviation in pipelined systems: a fast...

36
Gennette Gill Gennette Gill Montek Singh Montek Singh Bottleneck Analysis and Bottleneck Analysis and Alleviation in Pipelined Systems: Alleviation in Pipelined Systems: A Fast Hierarchical Approach A Fast Hierarchical Approach Univ. of North Carolina Univ. of North Carolina Chapel Hill, NC, USA Chapel Hill, NC, USA

Upload: rafe-marsh

Post on 18-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

Our Contribution  Identify bottlenecks in a pipelined system Recognize multiple components that limit throughput Bottlenecks represented in a Boolean expression  Classify bottlenecks Latency, cycle time, and occupancy dependent  Choose which transformation(s) apply Given a list of possible transforms List is open ended; allows for additions 3

TRANSCRIPT

Page 1: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Gennette GillGennette GillMontek SinghMontek Singh

Bottleneck Analysis and Alleviation in Bottleneck Analysis and Alleviation in Pipelined Systems: Pipelined Systems:

A Fast Hierarchical Approach A Fast Hierarchical Approach

Univ. of North CarolinaUniv. of North CarolinaChapel Hill, NC, USAChapel Hill, NC, USA

Page 2: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Part of a Larger Design FlowPart of a Larger Design Flow

2

Big Picture:Big Picture: High-level specifications asynchronous

implementations Design space exploration (this work) is part of

overall flow

High-level Specification Implementation

This work:This work: Use various optimizations together in one tool Exploits circuit hierarchy to accelerate

analysis/optimzn

Page 3: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Our ContributionOur Contribution Identify bottlenecks in a pipelined systemIdentify bottlenecks in a pipelined system

Recognize multiple components that limit throughput

Bottlenecks represented in a Boolean expression Classify bottlenecks Classify bottlenecks

Latency, cycle time, and occupancy dependent Choose which transformation(s) applyChoose which transformation(s) apply

Given a list of possible transforms List is open ended; allows for additions

3

Page 4: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

4

BackgroundBackgroundPipelines and Canopy GraphsPipelines and Canopy Graphs

Page 5: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Background: Asynchronous Background: Asynchronous PipelinesPipelines

5

Each stage characterized by three delays:Each stage characterized by three delays: Forward latency, Lf

time for data to propagate forwardReverse latency, Lr

time for a stage to receive and process ack time for a ‘hole’ to travel backward

Cycle time, T = Lf + Lr (typically) Throughput, tpt = 1 / cycle time

An abstracted view of the pipeline

Lf /Lr Lf /Lr

req

controllercontroller

LL LL LL

controllercontroller controllercontroller

logiclogic logiclogic

Cycle time in an asynchronous pipeline

ack

Page 6: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Background: Pipeline RingsBackground: Pipeline Rings

6

Throughput of ring depends on occupancy Throughput of ring depends on occupancy (#items)(#items) For small #items: underutilization limits

throughput For small #holes: congestion limits throughput Throughput also limited by the slowest stage Graph is a convex shape: “Canopy Graph”

1 2 N-2 N-10 Ring Occupancy

Rin

g Th

roug

hput

N

data data limitedlimited

holeholelimitedlimited

limited by limited by slowest stageslowest stage

Page 7: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Background: CompositionBackground: Composition

7

A B

AB

Combined

Pipe

line

Thro

ughp

ut

Pipeline Occupancy

AB

Combined

Pipe

line

Thro

ughp

ut

Pipeline Occupancy

A

B

Sequential Composition [Lines98]

Parallel Composition [Lines98]

Page 8: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

ConditionalsConditionals

8

Conditional branches:Conditional branches: Implement if-then-else non-speculatively Split sends data along only one path Boolean decision determines path Merge also uses Boolean; maintains order of data

Performance depends on:Performance depends on: Canopy graphs of then and else branches Boolean probability of choosing each branch

……then

elsesplit mergefork

datain

dataoutboolean ……

Page 9: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

ConditionalsConditionals

9

Simplifying assumption (relaxed later)Simplifying assumption (relaxed later) Boolean choices evenly distributed given a

probability p0 = 2/3 → 001001001…

Constraints on joint operation:Constraints on joint operation: Each branch’s occupancy (k) ∝ its probability:

Why? Because items must exit in order Each branch’s throughput ∝ its probability:

Throughput of composition:Throughput of composition: Scale each branch’s canopy graph by 1/pi Intersect the scaled canopy graphs

tpt0p0

= tpt1p1

k0p0

= k1p1

……then

elsesplit mergefork

datain

dataoutboolean ……

Page 10: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Conditionals: A Simple ExampleConditionals: A Simple Example

10

Example: pipelined implementation of CRC Example: pipelined implementation of CRC algorithmalgorithm

0 10 20 30 40 500 .0

0 .2

0 .4

0 .6

0 .8

0 .0 0 .2 0 .4 0 .6 0 .8 1 .00 .0

0 .2

0 .4

0 .6

0 .8

occupancy

thro

ughp

ut

probability

thro

ughp

ut

……1/1 5/1 5/1 1/1

1/1 3/1 2/1 1/1 1/1 1/1

split merge1/110 stages 3/1

9 stages

branch0

branch1

branch0branch1

min tpt0p0

, tpt1p1

⎛ ⎝ ⎜

⎞ ⎠ ⎟

Page 11: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Conditionals: A Simple ExampleConditionals: A Simple Example

11

Example: pipelined implementation of CRC Example: pipelined implementation of CRC algorithmalgorithm

0 10 20 30 40 500 .0

0 .2

0 .4

0 .6

0 .8

0 .0 0 .2 0 .4 0 .6 0 .8 1 .00 .0

0 .2

0 .4

0 .6

0 .8

occupancy

thro

ughp

ut

probability

thro

ughp

ut

branch0branch1

min tpt0p0

, tpt1p1

⎛ ⎝ ⎜

⎞ ⎠ ⎟

……1/1 5/1 5/1 1/1

1/1 3/1 2/1 1/1 1/1 1/1

split merge1/110 stages 3/1

9 stages

branch0

branch1

Page 12: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Conditionals: A Simple ExampleConditionals: A Simple Example

12

Example: pipelined implementation of CRC Example: pipelined implementation of CRC algorithmalgorithm

0 10 20 30 40 500 .0

0 .2

0 .4

0 .6

0 .8

0 .0 0 .2 0 .4 0 .6 0 .8 1 .00 .0

0 .2

0 .4

0 .6

0 .8

occupancy

thro

ughp

ut

probability

thro

ughp

ut

branch0branch1

min tpt0p0

, tpt1p1

⎛ ⎝ ⎜

⎞ ⎠ ⎟

……1/1 5/1 5/1 1/1

1/1 3/1 2/1 1/1 1/1 1/1

split merge1/110 stages 3/1

9 stages

branch0

branch1

Page 13: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Conditionals: A Simple ExampleConditionals: A Simple Example

13

Example: pipelined implementation of CRC Example: pipelined implementation of CRC algorithmalgorithm

0 10 20 30 40 500 .0

0 .2

0 .4

0 .6

0 .8

0 .0 0 .2 0 .4 0 .6 0 .8 1 .00 .0

0 .2

0 .4

0 .6

0 .8

occupancy

thro

ughp

ut

probability

thro

ughp

ut

branch0branch1

min tpt0p0

, tpt1p1

⎛ ⎝ ⎜

⎞ ⎠ ⎟

……1/1 5/1 5/1 1/1

1/1 3/1 2/1 1/1 1/1 1/1

split merge1/110 stages 3/1

9 stages

branch0

branch1

Page 14: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Conditionals: A Simple ExampleConditionals: A Simple Example

14

Example: pipelined implementation of CRC Example: pipelined implementation of CRC algorithmalgorithm

0 10 20 30 40 500 .0

0 .2

0 .4

0 .6

0 .8

0 .0 0 .2 0 .4 0 .6 0 .8 1 .00 .0

0 .2

0 .4

0 .6

0 .8

occupancy

thro

ughp

ut

probability

thro

ughp

ut

branch0branch1

min tpt0p0

, tpt1p1

⎛ ⎝ ⎜

⎞ ⎠ ⎟

……1/1 5/1 5/1 1/1

1/1 3/1 2/1 1/1 1/1 1/1

split merge1/110 stages 3/1

9 stages

branch0

branch1

Page 15: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Conditionals: A Simple ExampleConditionals: A Simple Example

15

Example: pipelined implementation of CRC Example: pipelined implementation of CRC algorithmalgorithm

0 10 20 30 40 500 .0

0 .2

0 .4

0 .6

0 .8

0 .0 0 .2 0 .4 0 .6 0 .8 1 .00 .0

0 .2

0 .4

0 .6

0 .8

occupancy

thro

ughp

ut

probability

thro

ughp

ut

branch0branch1

min tpt0p0

, tpt1p1

⎛ ⎝ ⎜

⎞ ⎠ ⎟

……1/1 5/1 5/1 1/1

1/1 3/1 2/1 1/1 1/1 1/1

split merge1/110 stages 3/1

9 stages

branch0

branch1

Page 16: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Conditionals: A Simple ExampleConditionals: A Simple Example

16

Example: pipelined implementation of CRC Example: pipelined implementation of CRC algorithmalgorithm

0 10 20 30 40 500 .0

0 .2

0 .4

0 .6

0 .8

0 .0 0 .2 0 .4 0 .6 0 .8 1 .00 .0

0 .2

0 .4

0 .6

0 .8

occupancy

thro

ughp

ut

probability

thro

ughp

ut

branch0branch1

min tpt0p0

, tpt1p1

⎛ ⎝ ⎜

⎞ ⎠ ⎟

……1/1 5/1 5/1 1/1

1/1 3/1 2/1 1/1 1/1 1/1

split merge1/110 stages 3/1

9 stages

branch0

branch1

Page 17: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Conditionals: A Simple ExampleConditionals: A Simple Example

17

Example: pipelined implementation of CRC Example: pipelined implementation of CRC algorithmalgorithm

0 10 20 30 40 500 .0

0 .2

0 .4

0 .6

0 .8

0 .0 0 .2 0 .4 0 .6 0 .8 1 .00 .0

0 .2

0 .4

0 .6

0 .8

occupancy

thro

ughp

ut

probability

thro

ughp

ut

branch0branch1

min tpt0p0

, tpt1p1

⎛ ⎝ ⎜

⎞ ⎠ ⎟

……1/1 5/1 5/1 1/1

1/1 3/1 2/1 1/1 1/1 1/1

split merge1/110 stages 3/1

9 stages

branch0

branch1

Page 18: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

0 10 20 30 40 500 .0

0 .2

0 .4

0 .6

0 .8

0 .0 0 .2 0 .4 0 .6 0 .8 1 .00 .0

0 .2

0 .4

0 .6

0 .8

occupancy

thro

ughp

ut

probability

thro

ughp

ut

branch0branch1

min tpt0p0

, tpt1p1

⎛ ⎝ ⎜

⎞ ⎠ ⎟

Conditionals: A Simple ExampleConditionals: A Simple Example

18

Example: pipelined implementation of CRC Example: pipelined implementation of CRC algorithmalgorithm

……1/1 5/1 5/1 1/1

1/1 3/1 2/1 1/1 1/1 1/1

split merge1/110 stages 3/1

9 stages

branch0

branch1

Page 19: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Conditionals: A Simple ExampleConditionals: A Simple Example

19

Example: pipelined implementation of CRC Example: pipelined implementation of CRC algorithmalgorithm

0 10 20 30 40 500 .0

0 .2

0 .4

0 .6

0 .8

0 .0 0 .2 0 .4 0 .6 0 .8 1 .00 .0

0 .2

0 .4

0 .6

0 .8

occupancy

thro

ughp

ut

probability

thro

ughp

ut

branch0branch1

min tpt0p0

, tpt1p1

⎛ ⎝ ⎜

⎞ ⎠ ⎟

……1/1 5/1 5/1 1/1

1/1 3/1 2/1 1/1 1/1 1/1

split merge1/110 stages 3/1

9 stages

branch0

branch1

Page 20: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

0 10 20 30 40 500 .0

0 .2

0 .4

0 .6

0 .8

Conditionals: A Simple ExampleConditionals: A Simple Example

20

Example: pipelined implementation of CRC Example: pipelined implementation of CRC algorithmalgorithm

0.0 0 .2 0 .4 0 .6 0 .8 1 .00 .0

0 .2

0 .4

0 .6

0 .8branch0branch1

occupancy

thro

ughp

ut

probability

thro

ughp

ut

min tpt0p0

, tpt1p1

⎛ ⎝ ⎜

⎞ ⎠ ⎟

……1/1 5/1 5/1 1/1

1/1 3/1 2/1 1/1 1/1 1/1

split merge1/110 stages 3/1

9 stages

branch0

branch1

Page 21: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Conditionals: Example with Slack Conditionals: Example with Slack MismatchMismatch

21

Slack mismatch implicitly handled by analysis Slack mismatch implicitly handled by analysis methodmethod

0 10 20 30 400 .0

0 .2

0 .4

0 .6

0 .8

0 10 20 30 400 .0

0 .2

0 .4

0 .6

0 .8

occupancy

thro

ughp

ut

thro

ughp

ut

occupancy

branch0branch1

branch0branch1

slack matched

…1/1 5/1 5/1 1/1

1/1 3/1 2/1

split merge…10 stages 3/1

9 stages

branch0

branch1

slack mismatch

Page 22: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

0 5 10 15 2 00 .00

0 .05

0 .10

0 .15

0 .20

0 .25

0 .30

0 .35

Occupancy

Throughtput

Conditionals: Generalized Choice Conditionals: Generalized Choice ModelModel

22

Extend to more general choice model:Extend to more general choice model: Until now: assumed non-clustered decisions Now: consider clustering

Allow arbitrary runs of 0’s and 1’s for decisions Long runs reduce throughput: other branch is

underutilized Our Analysis Approach:

Introduces a “clustering factor” to quantify decision run lengths

e.g., for random uncorrelated data: ave. run length of 0’s is 1/p1

Analysis approach can handle arbitrary amounts of clustering

0 .0 0 .2 0 .4 0 .6 0 .8 1 .0

0 .2 0

0 .2 5

0 .3 0

0 .3 5

0 .4 0

Probability of choos ing branch1

Throughput

probability of choosing branch1

thro

ughp

ut

thro

ughp

ut

occupancy

non-clustered

random

acts as bottleneck

Page 23: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

s2

i < N s3

s4

s5

i++

s6

s7s1 s8interface

Pipelined LoopsPipelined Loops

23

Analysis approach can handle single-token and Analysis approach can handle single-token and multi-token loopsmulti-token loops

Loop’s throughput depends on #iterations per itemLoop’s throughput depends on #iterations per item Assume given: #iterations/item or prob. of exiting the ring Note: Previous analysis looked at a different throughput

#iterations/second, not #completions/second

Page 24: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Pipelined LoopsPipelined Loops

24

Analysis approach for loops:Analysis approach for loops: Construct canopy graph for loop body Scale down based on expected number of

iterations

5/1 5/1 5/1 5/1 5/1

5/1 5/1 5/1 5/1 5/1

5/1 5/1 5/1 5/1 5/1

fork join

branch0

branch1

Boolean

forkLoop

interface

0 2 4 6 80 .00

0 .05

0 .10

0 .15

Occupancy

Throughtput

occupancy

thro

ughp

ut

Loop body

loop body

overall loop

Page 25: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

25

Bottleneck IdentificationBottleneck Identification

Page 26: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Our Definition of BottleneckOur Definition of Bottleneck Set of hierarchical nodes that limit canopy graphSet of hierarchical nodes that limit canopy graph

Expressed as a Boolean combinationExpressed as a Boolean combination e.g. n0 OR n2 OR n3 OR n5 AND n6

27

n0

n1

n2

n3

n4

n5 n6

par

leaf

leaf

seq

leafleaf

par n2

n0

n1

n3 n4

n5 n6

Occupancy

Thro

ughp

ut

What caused this segment?• Usually more than one node• Often several conspire together

Page 27: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Find Limiting SegmentsFind Limiting Segments

28

par

n2

n0

n1

What sets this limit?

Occupancy

Thro

ughp

ut Begin with top segment of root nodeBegin with top segment of root node Find which child/children contribute to segmentFind which child/children contribute to segment

If more than one, is it AND or OR blame? Continue to lower levels of hierarchyContinue to lower levels of hierarchy

Page 28: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Find Limiting Segments: example 1Find Limiting Segments: example 1

29

par

n2

n0

n1

Occupancy

Thro

ughp

ut

n1 n2

Scenario: parallel operator limited by slow childScenario: parallel operator limited by slow child n1 and n2 contribute to bottleneck bottleneck(topn0) = topn0 OR bottleneck(topn1) AND bottleneck(topn2) next, find which children of n1 and n2 limit throughput

What sets this limit?

Page 29: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Find Limiting Segments: example 2Find Limiting Segments: example 2

31

par

n4

n2

n3

Occupancy

Thro

ughp

ut Scenario: parallel operator limited by slack mismatchScenario: parallel operator limited by slack mismatch

changing n3 or n4 could fix bottleneck bottleneck(topn2) = topn2 OR bottleneck(reversen3) OR bottleneck(forwardn4)

To fix, change n2, n3, or n4To fix, change n2, n3, or n4

n3 n4

What sets this limit?

Page 30: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Find Limiting Segments: example 3Find Limiting Segments: example 3

32

seq

n6

n4

n5

Occupancy

Thro

ughp

ut

n5 n6

Scenario: sequential operator limited by forward latencyScenario: sequential operator limited by forward latency changing n5 or n6 could fix the bottleneck bottleneck(forwardn4) = forwardn4 OR bottleneck(forwardn5) OR bottleneck(forwardn6)

If just one child is slow, only one contributesIf just one child is slow, only one contributes

n4

What sets this limit?

Page 31: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Bottleneck AlleviationBottleneck Alleviation

33

Page 32: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Bottleneck CategorizationBottleneck Categorization

34

topC

forwardC

reverse1,C

reverse0,C

Occupancy

Thro

ughp

ut

Type I: Latency Dependent

Type II: Cycle Time Dependent

Type III: Occupancy Dependent

Categories based on which c.g. segment limits tptCategories based on which c.g. segment limits tpt

Page 33: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

TRansformations for Increasing the Canopy TRansformations for Increasing the Canopy GraphGraph A TRIC increases throughput for some A TRIC increases throughput for some

occupanciesoccupancies Idea: collect a bag of TRICs Idea: collect a bag of TRICs

Categorize circuit optimizations by bottleneck type Use different optimizations in one framework

Effects of few example TRICs:Effects of few example TRICs:

Suggestions for addl. TRICS needed.Suggestions for addl. TRICS needed.35

OccupancyThro

ughp

ut

OccupancyThro

ughp

ut

OccupancyThro

ughp

utParallelization Buffer InsertionStage Splitting

Fixes: Type I Fixes: Type IIICauses: Type I

Fixes: Type IICauses: Types I,III

Page 34: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Applying TRICsApplying TRICs Tool lists TRICs that alleviate current Tool lists TRICs that alleviate current

bottleneckbottleneck

Designer chooses one optionDesigner chooses one option Check for next bottleneck as neededCheck for next bottleneck as needed

36

TRIC Type I Type II Type IIICoalescing ✔ X X

Parallelization ✔ - X

Stage Splitting X ✔ ✔

Loop Pipelining X ✔ ✔

Duplication - ✔ ✔

Loop Unrolling - ✔ ✔

Buffer Insertion X - ✔

Page 35: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

ResultsResults

Successful with 20% throughput goals on Successful with 20% throughput goals on examplesexamples

Suggest examples, please.Suggest examples, please.37

Example   Throughput       Type    

  orig goal final  # iter   I II III   TRICS

CRC   286 342 345   4   1 0 3   coalesce; add

bufffers Cordic

cond  90.9 109 111   2   0 0 2   add buffers

Cordic  83.3 100 101   2   0 1 2   split stages

Diffeq   182 218 267   1   3 0 0   split stages;

duplicate

Mult  38.4 46.2 62.5   6   5 0 1   coalesce; add

buffers

Page 36: Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,

Conclusion & Future WorkConclusion & Future WorkThis Work:This Work:

Employed multiple microarch. optimizations in one tool

User-guided application to a few examplesMore is needed:More is needed:

Clever ways to automate Additions to the bag of TRICs More examples and applictions

38