university of british columbia dept. of electrical and computer engineering november 30, 2007 a...

29
University of British Columbia Dept. of Electrical and Computer Engineering November 30, 2007 A Combined Clustering and Placement Algorithm for FPGAs Mark Yamashita

Upload: kimberly-wade

Post on 01-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 2007

A Combined Clustering and Placement Algorithm for FPGAs

Mark Yamashita

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 20072

Contributions

• New algorithm to do clustering and placement

• Novel approach for trading-off depth for duplication control

• Timing model/placement incorporated into clustering

• Delay improves by an average of 11%

• Controllable trade-off between area overhead and delay improvements

• Plan to submit to FPL ‘08

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 20073

Motivation

• FPGAs need to be faster• 4x slower than ASICs

• Limitations of existing clustering approaches:• No depth control during clustering, often greedy

• Provide no means for duplication, or

• Use duplication in excess

• Inaccurate timing models

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 20074

Motivation

• GOAL: • Improve critical-path delay by improving

clustering

• Approach:• Use placement information to form accurate

timing model

• Make better clustering decisions

• Use duplication to reduce depth

• Take advantage of otherwise unused logic in FPGA

• Control amount of duplication by relaxing depth

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 20075

Algorithm Overview

T-VP

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 20076

Phase 1: Microcluster Formation

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 20077

Phase 1: Example

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 20078

Phase 1: Lawler Levitt Turner Algorithm

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 20079

Phase 1

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200710

Phase 1: Node Duplication Reduction

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200711

Phase 1: Block Usage Results

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

tseng

ex5p

apex

4ds

ip

mise

x3dif

feq

alu4

des

bigke

yse

q

apex

2s2

98 frisc

ellipt

icsp

lapd

c

ex10

10

s384

17

s385

84.1

clma

MCNC Circuit

To

tal

Blo

cks

TVPack

Lawler

Reduced

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200712

Phase 1: Additional Duplication Reduction Through Depth Relaxation

11.5

11.7

11.9

12.1

12.3

12.5

12.7

12.9

13.1

Lawle

rs

Single

Pass

70%

50%

30%

20%

10% 5%

TVPack

Clustering Method

Tc

rit

[ns

]

200

250

300

350

400

450

500

CL

B C

ou

nt

Tcrit [ns]

CLBs

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200713

Algorithm Overview

T-VP

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200714

Phase 2: Microcluster Compaction with Orchestrator

• Iteratively move microclusters to improve timing

• Can fit multiple microclusters to the same CLB position, provided the aggregate of all microclusters meets CLB constraints

• If an area constraint is given, remove duplication and fragmentation until constraint is met

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200715

Phase 2: Orchestrator Example

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200716

Phase 2: Orchestrator Example

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200717

Phase 2: Orchestrator Example

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200718

Phase 2: Orchestrator Example

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200719

Phase 2: Orchestrator Example

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200720

Phase 2: Orchestrator Example

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200721

Phase 2: Orchestrator Example

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200722

Results: Timing

0.00

5.00

10.00

15.00

20.00

25.00

dsip

bigke

yde

s

mise

x3 seq

apex

4alu

4ex

5p

s385

84.1

apex

2dif

feq

tseng sp

la

ex10

10 pdc

s384

17

ellipt

ics2

98clm

afri

sc

MCNC Benchmark

Tcr

it [

ns]

T-VPack

Orchestrator

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200723

Results: Area

0

200

400

600

800

1000

1200

1400

MCNC Benchmark

CL

Bs

Us

ag

ed

0.00

5.00

10.00

15.00

20.00

25.00

Tc

rit

[ns

]

T-VPack

Orchestrator

T-VPack

Orchestrator

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200724

Results: Timing vs. Area

11.5

12

12.5

13

13.5

14

Unlimited Min +3 Min +2 Min +1 Minimum TVPack

Clustering

Tc

rit[

ns

]

200

220

240

260

280

300

320

340

360

380

400

CL

Bs Tcrit [ns]

CLBs

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200725

Results: Timing vs. Depth

-5.0%

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

0% 10% 20% 30% 40% 50% 60%

Depth Improvement

Tim

ing

Imp

rov

em

en

t

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200726

Conclusions

• Reducing depth contributes to a reduction in critical path delay

• Node duplication, when used effectively, reduces critical path delay

• Duplication can be used to provide a performance-area tradeoff to the designer

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200727

Future Work

• Promising Post-Placement Optimizations:• Retiming

• Leverage a more significant depth reduction

• Logic reintroduction

• Create duplication to increase performance

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200728

Contributions

• New algorithm to do clustering and placement

• Novel approach for trading-off depth for duplication control

• Timing model/placement incorporated into clustering

• Delay improves by an average of 11%

• Controllable trade-off between area overhead and delay improvements

• Plan to submit to FPL ‘08

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200729

Thank You