ece 506 reconfigurable computing lecture 7 fpga placement

33
ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Upload: allen-mervin-reynolds

Post on 23-Dec-2015

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

ECE 506

Reconfigurable Computing

Lecture 7

FPGA Placement

Page 2: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Placement° VLSI Design Flow

• Objective: - Minimize total chip area, - Sustain routable circuit within timing budget

° FPGA Flow• Area fixed• Objective:

- Assign LUTs in the netlist to available logic blocks in the array within utilization and performance constraints (Interconnect)

- Locate functional blocks such that the interconnect required to route the signals between them is minimized.

• Target Architecture determines the cost function

Page 3: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Placement algorithm° two basic inputs:

• netlist with functional blocks and connections between them

• device map (architecture)

°algorithm selects a legal location for each block such that the circuit wiring is optimized.

Page 4: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Significance of Placement

°Good placement is extremely important• sets constraints for routability • even if the circuit does route, a poor placement will

still lead to a lower maximum operating speed and increased power consumption.

°Finding a good placement is challenging • A large commercial FPGA contains over 500,000

functional blocks, - 500,000! Possible placements.

• Exhaustive evaluation is therefore impossible. • Placement is a computationally hard problem,

- no known algorithm that produces optimal results in practical central processing unit (CPU) time.

• Development of fast and effective heuristic placement algorithms is a critical research area.

Page 5: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Device Legality Constraints

°All resources are prefabricated in an FPGA • leads to a variety of placement legality constraints:

°A legal placement must place a functional block only in a location on the chip that can accommodate it.

• RAM block must be placed in a RAM location, and a lookup table (LUT) must be placed in a LUT location.

°Some groups of functional blocks must be placed in a specific relative orientation to make use of special, dedicated routing resources.

• arithmetic logic cells—to use the dedicated carry-chain hardware, the logic cells forming a carry chain must be placed adjacent to each other in the sequence required by the carry structure.

Page 6: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

FPGA Placement Constraints

°FPGA interconnect is prefabricated, • Amount of interconnect in each region of a device is

fixed

°Routing congestion • When the interconnect demand approaches or

exceeds the fabricated wiring capacity in some part of the FPGA.

• A placement that requires more interconnect in a device region than that region contains cannot be routed

Page 7: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

FPGA Placement Constraints

°Stratix-II is an island-style FPGA that contains routing segments that span 4, 16, and 24 logic blocks.

• Programmable switches allow routing segments in the same direction (horizontal or vertical) to be connected at their endpoints to create longer routes.

• Other programmable switches allow some horizontal routing segments to connect to vertical routing segments where they cross and vice versa.

X YLength 4

Length 2

Length 1

Page 8: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Placement Objective– Routability Driven

°Create a placement that minimizes the total interconnect required,

° Increase the probability of successful routing

°Consequently, some routability-driven placement algorithms minimize not only the total wiring required by the design but also the amount of routing congestion.

Page 9: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Placement Objective – Timing Driven

° In addition to optimizing for routability, timing-driven algorithms use timing analysis

• to identify critical paths and/or connections

• to optimize the delay of those connections.

°Most delays in an FPGA are due to the programmable interconnect

• timing-driven placement can achieve a large improvement in circuit speed over routability-driven approaches.

Page 10: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Level of Control on Placement

°Commercial FPGA placement tools allow designers to control the placement

°Common types of placement directives. °1) Exact location of a block

• The most restrictive• Typical uses

- to lock down the design I/Os at the locations required by the circuit board or to lock down the elements of a performance-critical intellectual property (IP) core.

°2) Area specific• less restrictive • forces blocks to go into a specific 2D area, • allows a designer to guide the placement tool

Page 11: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Level of Control on Placement

°3) Relative location• specify the relative location of several blocks, • placement tool chooses exactly where to locate the

block group. • Typical use

- for library components where a designer knows a good placement of the component blocks relative to each other.

°4) Floating region • specifies that some logic should be placed within a

tight region • placement tool can choose where that region should

be on the device.

Page 12: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Placement Algorithms

• Constructive methods: - Begin from netlist and generate an initial placement.- Partitioning method: Mincut - First address placement of partitions individually

– Significant amount of reduction in search space- Then address placement of partitions relative to

each other- Not suitable for FPGAs

– Especially island style FPGA with limited routing resources– Method postpones the impact of inter-partition connections– Leads to increased demand on routing tracks

Page 13: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Placement• Placement has a set of competing goals.• Can’t optimize locally and globally simultaneously.• Use heuristic approaches to evaluate quality.

C D F

AB

E1 2

LUT1 LUT2ABCD E

Page 14: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Getting Stuck with Local Minima

• pick a random starting point • repeatedly swap,

• if the new state has a lower cost, it is accepted, • otherwise the current state is retained.

• greedily accept good moves• Problem: large number of local minima

• circuit placed as shown at left, is in a local minima. • No swap of logic or I/O functions will reduce the total

wirelength.

Page 15: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Technology Mapping to Placement

Mapping onto 5-LUT

Page 16: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Technology Mapping to Placement

Page 17: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Iterative Placement Algorithms° Iterative improvement

• Begin with random or constructive placement.• Iterate to improve it. • Pairwise interchange• Hill climbing

- To avoid getting trapped in local minima, consider “hill-climbing” approach

- Need to accept worse solutions or make “bad” moves to get global minima.

- Acceptance is probabalistic. Only accept cost-increasing moves some of the time.

Page 18: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Iterative Placement Algorithms°Methods

• Force-directed methods (classical mechanics)

- Force vector computed on each module corresponding to all nets

- Solve set of non-linear differential equations.– FD relaxation– FD pairwise exchange

• Simulated annealing (statistical mechanics)- Model a physical annealing process which optimizes energy.- Similar to “quenching” metal.- Generates best results- Can be time consuming

• Macro-based approaches- Genetic algorithms

Page 19: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Physical Annealing

• Take a metal and heat to high temperature• Allow it to cool slowly; metal is annealed to a low

temperature• Atoms in the metal are at lower energy states after annealing• Higher the temperature initially and slower the cooling, the

tougher the metal becomes.• Atoms transition to high energy states and then move to low

energy.

Page 20: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Simulated Annealing

• Optimization strategy based on physical annealing process• Generate random moves.

- Initially, accept moves that decrease and increase cost.

• As temperature decreases, the probability of accepting bad moves decreases.

• Eventually, default to greedy algorithm

Only accept positive moves

Determine when to terminate.

Page 21: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Simulated Annealing

Page 22: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Bounding Box and Cost Function°Bounding box

underestimates wirelength • q(n) is compensation factor

- q is 1 for 3- and 2-terminal nets- increases to 2.79 for 50 terminal

nets

• Cav is channel capacity (tracks) in x and y directions over the bounding box of net n

- penalizes placements which require more routing in areas of the FPGA that have narrower channels.

- However, Cav is constant since channel width is fixed for island style FPGA

Page 23: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Placement Flow

Page 24: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Wire length measures

° Estimate wire length by distance between components.

° Possible distance measures:• Euclidean distance (sqrt(x2 + y2));• Manhattan distance (x + y).

° Multi-point nets must be broken up into trees for good estimates. Euclidean

Manhattan

Page 25: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Weighted Graph -> Distance Table° Geometric Distance NOT Accurate !!!° Need Weighted Graph

• Cost of Routing Resources

° Finding Shortest Path at Each Step of Annealing costly• Need for Lookup Table

Page 26: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Simulated Annealing – Moves per iteration

Moves_per_iteration = BN4/3

• N = # of logic blocks and I/O pads• B = scaling factor

Page 27: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Simulated Annealing – Swapping Range

• Swap distance is adjusted based on the acceptance rate as well.

• Initially set to entire FPGA • As T drops, distance drops.

Page 28: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Simulated Annealing

• New T depends on the fraction of attempted moves that were accepted.

• Reduces rapidly when acceptance rate is high

• When the temperature is less than a small fraction of the average cost of a net, it is unlikely that any move that results in a cost increase will be accepted, so we terminate the anneal.

Page 29: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Annealing Criteria

• Contemporary FPGA packages use the following parameters:

1. Starting temp – 20 * stand_dev(cost of N swaps)

2. Cost function – weighted sum of wire length and delay

3. Inner loop – B * N4/3 • Beta cost function

4. Stopping criteria – • T < [.005 * Cost/Nnets]

Page 30: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement
Page 31: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement
Page 32: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement
Page 33: ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Strengths of SA making it suitable for FPGA

°Can enforce all the legality constraints imposed by the FPGA architecture fairly directly

• By forbidding the creation of illegal placements in the move generator

• By adding a penalty cost to illegal placements.

°Can directly model the impact of the FPGA routing architecture on circuit delay and routing congestion

• By creating an appropriate cost function