advanced algorithms and research...

40
1 Advanced algorithms and research applications Laurent Lemarchand Laurent Lemarchand LISyC/UBO LISyC/UBO [email protected] [email protected]

Upload: others

Post on 21-Nov-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

1

Advanced algorithms and research applications

Laurent LemarchandLaurent LemarchandLISyC/UBOLISyC/UBO

[email protected]@univ-brest.fr

Page 2: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

2

Logic synthesis for LUT-based FPGApresentation

LUT-based FPGA

Synthesis flow

Boolean networks for circuit synthesis

Large scale problems : parallelism and partitionning (cf TCAD IEEE 01/2012)

Algorithms

Page 3: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

3

Logic synthesis for LUT-based FPGAsynthesis flow

LUT-based circuits Cells Routing

Page 4: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

4

Logic synthesis for LUT-based FPGAsynthesis flow

Circuit 1 000 4-LUT runtimes (mn)

Simplification Mis II 2

K-bounded Roth-Karp decomposition

6

Covering (surface) Mis-Pga 36

Covering (unit delay) Flowmap 3

High levelsynthesis

placement

routing

Logic synthesis

Logicoptimization

Technologymapping

Page 5: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

5

Logic synthesis for LUT-based FPGAsynthesis problems size

  Virtex-II

1000

Virtex-II

3000

Spartan-3 1000

Spartan-3 2000

Virtex-5

LX30

Virtex-5

LX50

Virtex-5

LX85

Virtex-5

LX110

Portes 1 million

s

3 million

s

1 million

s

2 million

s

----- ----- ----- -----

Bascules 10240 28672 15360 40960 19200 28800 51840 69120

LUT 10240 28672 15360 40960 19200 28800 51840 69120

Multiplieur

40 96 24 40 32 48 48 64

Bloc de RAM (kbit)

720 1728 432 720 1152 1728 3456

Page 6: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

6

Logic synthesis for LUT-based FPGAsynthesis problems size

Algorithms have at leasr O(n2) complexity Combinatorial explosion Resources problem

Computations Memory

size (SOP size)

time (mn)

Page 7: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

7

Logic synthesis for LUT-based FPGAsynthesis problems size

Algorithms have at leasr O(n2) complexity Combinatorial explosion Resources problem

Computations Memory

size (SOP size)

time (mn)

Divide and conquerPartitionning The design

Page 8: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

8

Logic synthesis for LUT-based FPGAboolean network

Directed acyclic graph

Primaryinputs

Primaryoutputs

Page 9: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

9

Logic synthesis for LUT-based FPGAboolean network and tech. mapping

Input : a Directed acyclic graph (DAG) G = (V, E) Output: K-feasible DAG G' = (V', E') :

v V', |inputs(v)| K

1 node = 1 K-LUT Technology mapping Lot of objectives

Surface : #LUT Delais : critical paths Routability : connection degrees & density

Optimizedbooleannetwork

Decomposition Feasiblenetwork

Technologymapping

Optimizedfeasiblenetwork

Page 10: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

10

Chortle-crf FPGA (Field Programmable Gate Array)

Minimize the used # LUTs

Place and route the LUTs

Page 11: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

11

Chortle-crf Dynamic Programming

Technology mapping for LUT-based FPGA

Cluster logic nodes into k-LUTs : one LUT can implement any logic function of up to k inputs (fanin) (truth table)

hd

e

b

c

f

a

g3-LUT

func(a,b,c)

a

b

c

g

Page 12: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

12

Chortle-crf Dynamic Programming

d

eb1

b2

Process from inputs to outputs

Solution for d+b1+b2 must Minimize the number of LUT Minimize the fanin of head LUT

Page 13: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

13

Let G = (V, E) with |V| = 2n find a partition X = V1 U V2 s.t |V1| = |V2| = n while minimizing edges crossing parts Parwise exchange neighborhood

Local search2-way partitionning problem

a

b

c=7

a

b

c=5 = 7-3+1

Page 14: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

14

At each step, choice the exchange maximizing the cut number gain Constraint : a node can be swapped only one time N/2 steps at most

2-way partitionningKernighan-Lin heuristic

Page 15: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

15

Extending 2-way partitionning Recursively or Kernighan Lin based

K-way partitionning : Minimize the global cut Balance parts size

Multi-level partitioning (Métis)

K-way partitionningtechniques

Page 16: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

16

Multi level partitionning

Multi level partitionningHMétis

clustering

Groupingnodes

Partitionning After unclustering After refinement

unclusteringclustering

Initial partitiononto clustered

graph

Page 17: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

17

Ciircuit partitionning nodes : logic gates edges : connections Create abd optimize sub systems

Multi level partitionningLogic synrthesis

Page 18: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

18

Motivations: divide and conquer Simple, Multi algorithms Problem size runtimes Parallelism

Quality ? Synthesis ever with multiple algorithms rarely optimal Better heuristics

Limit information loss

Partition based logic synthesisData partitionning

Page 19: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

19

Avoid maximum loss. 2-way partitionned network

Partition based logic synthesisNetwork partitionning

Nodes affected to parts 1 and 2 whileminimizing the cutand balance parts

Page 20: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

20

Avoid information loss. Primary I/O generated

Partition based logic synthesisNetwork partitionning

Page 21: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

21

Information loss : A and B in 2 distinct parts

Nodes are disconnected by the partitionning

Partition based logic synthesisQuality loss because Information loss

Page 22: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

22

Each part must lead to the same computing load Evaluate a priori sysnthesis algorithms runtimes

Partition based logic synthesisLoad balancing

time time

accumulated time : 125speed up : 125/100 = 1.25

accumulated time : 125speed up : 125/35 = 3.57

Page 23: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

23

Depends on synthesis algorithm And on network structure

Partition based logic synthesisLoad balancing

Algorithm (a) (b)

Boolean simplification O(n) O(1)

Technology mapping O(1) O(n)

Page 24: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

24

Depend of algorithms nature Local. Ex : k-feasible network decomposition Global. Ex : delay optimization

Critical path optimization

Evaluation criteria Synthesis ruuntimes and speedups Quality

LUT based FPGA technology mapping tools Mis-PGA Flowmap-d

Partition based logic synthesisResults

Page 25: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

25

Area optimization # LUT (CLBs = 2 LUTs in xc4100 series) Local and global decompositions (And/Or, Roth/Karp, kernels,

…) Global boolean simplifications (réinjections, substitution, ...) Exact or heuristic covering (BCP)

Important runtimes Automatic substitution of exact algorithms by heuristics for

large problems Impact on synthesis runtimes

Partition based logic synthesisMis-PGA

Page 26: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

26

Area optimization. #CLBs 12 circuits bench LGSYNTH'91

#LUT (CLBs = 2 LUTs in xc4100 series FPGA)

Quality loss / global synthesis without partitionnning

Partition based logic synthesisMis-PGA : quality

Loss

(%

)#

CLB

# partitions

Page 27: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

27

Cumulated runtime onto a single processor

speedup

Partition based logic synthesisMis-PGA : runtimes

Run

times

(se

c)sp

eedu

p

# partitions

Page 28: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

28

Example : Flowmap-d : delay optimization Critcal paths (U) or nominal delay (N, congestion)

Partition based logic synthesisFlowmap-d : quality

parts

Loss (%)

Page 29: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

29

Flowmap-d : O(n2)

Partition based logic synthesisFlowmap-d : runtimes

Parts (procs)

time speedupspeeduptime

mean

Page 30: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

30

Quality Loss(25%) with unit delay model (critical path) Gain (10%) with nominal delay model (better optimization

with heuristics since partitionning process exhibit congested areas)

Speed up : Superlinear for large scale designs

Important runtimes required to absorb the paralllelism overcost

Partition based logic synthesisFlowmap-d results

Page 31: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

31

QoS in Home Network for VBR

User demand : High Quality of Service for video broadcast  Affordable, well known, closed, network environment Stream priority according to other network usages Bandwitdh reservation : guaranteed QoS (no delay)

gateway

tablet_1

connectedTV_1

PC_2

PC_1

console_1PC_3

L_3

L_2

L_1

STB_1

Wi-Fi

Page 32: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

32

Bandwitdh reservation

VBR video encoding case : Allocate average rate ? quality loss Allocate peak rate ? bandwidth loss Allocate exact rate ? Unpracticable – hard constraints

tradeoff between peak and exact reservation

time

trh

rou

put

thro

ugh

put

Page 33: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

33

Variable Bitrate Hull

Series of bitrates (amount of data per time slot) b

i for each time slot i (e.g 1-sec time slots)

ri : reservation of network bandwidth for each time slot ii r

i > b

i reservation hull that ensures QoS

time

bitrates reservations

r1

r4

r3r

2

thro

ugh

puts

Page 34: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

34

Constraints on reservation policy

Two aspects taken into account. Reservation consists in configuring network resources

M : bounded # successive differents reservation ri

P : minimal time between 2 reconfigurations (ie, minimal reservation duration)

time

throughputs reservations Lost bandwidth

r1

r4

r3r

2

thro

ug

hpu

ts

Page 35: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

35

Graph and optimization goal

time

r1

r4

r3r

2

thro

ug

hpu

ts

00 3323168121 52 67 61 total

cost:301

endstart

throughputs reservations Lost bandwidth

How to minimize  ? total cost

Page 36: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

36

Graph paths

time

r1

r4

r3r

2

thro

ug

hpu

ts

00 3323168121 52 67 61 total

cost :301

endstartr

1

Cost gain

r'1

OR:

00 8660 15

….

totalcost :255

throughputs reservations Lost bandwidth

Page 37: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

37

Graph building

0t0tn+1

t1

cost0→1 endstart ….

cost1→2

cost0→2

cost0→i

cost1→i

t2

One node per time slot j > i an edge t

i → t

j : config at time t

i and reconfig at t

j

Weights correspond to overcosts An extra node at the end

Page 38: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

38

Best solution computation

Best solution : minimal total overcost Shortest path

Contraints on bandwidth allocation P : minimal time between 2 reconfigurations

(ie, minimal configuration duration)

M : bounded # successive differents configurations ci

Bellman on DAG

remove edges i → j s.t j - i < P

M first steps of Ford-Bellman algorithm

Page 39: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

39

Simulation results

NS2 simulation tool Hierarchical token bucket VBR : series of bitrates sources Delays and buffer size measurements

300 400 500 600 6400

200400600800

100012001400

0510152025303540

(a) fixed size HTB

delay (s)buffer size (kB)

bandwidth allocation

buff

er

size

(kB

)

dela

y (s

)

140 175 210 245 280 315 3500

50

100

150

200

250

300

0

5

10

15

20(b) M=149

delay (s)buffer size (kB)

average bandwidth allocation

buff

er

size

(kB

)

dela

y (s

)

Page 40: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing

40

Implementation

Server (Linux Ubuntu system)

Streamingcomponant

(VLC)

Net

wor

k In

terf

ace

Streamingobserver

Reservationpolicy

Hullvalues

times

configurations

stream

Client (Beagleboard)

Streamingclient

(mplayer)

Raisederrors

# errors #errors #frames

Reference 8 5.6x10-4

1429 kbits/s 11 7.7x10-4

640 kbits/s 80 56.0x10-4

Hull P5 17 12.0x10-4