placement of shapeable blocks*) - philips bound... · example of a bounding box of anet in a...

Phllips Journalor Research Vol. 43 No. I 1988 1

Philips J. Res. 43, 1-22, 1988 R 1176

PLACEMENT OF SHAPEABLE BLOCKS*)

by F.M.J. DE BONT, E.H.L. AARTS, P. MEEHAN* andC.G.O'BRIEN*

Philips Research Laboratories, P.O. Box 80000, 5600 JA Eindhoven, The Netherlands·M.E.E. Department, Trinity College, Dublin 2, Ireland

Abstract

A simulated annealing-based algorithm is presented for the placement ofshapeable blocks for IC layouts. The algorithm minimizes both the en-veloping area and the estimated total wire length of a set of blocks in-terconnected by nets. The size of the blocks may vary within limits.Overlap among blocks is allowed but increasingly penalized as the optim-ization process continues. The presence of overlap is found to be essen-tial for obtaining good solutions to the placement problem. The variousterms in the cost function are discussed in detail. In addition to the se-quential implementation the authors present a parallel implementationof the placement algorithm on an experimental multiprocessor architec-ture using a clustered simulated annealing algorithm. Both for the se-quential and parallel implementation it is concluded that the placementalgorithm performs well, near-optimal solutions within reasonable com-putation time are archieved.

Keywords: Building blocks, optimization, placement, simulated anneal-ing.

1. Introduetion

Recent advances in VLSI technology enable the design of large, denselypacked digital circuits and systems. Designing such systems is a complex taskwhich has become impracticable without making use of advanced designmethods and automated design aids such as silicon compilers. A wealth ofdetails is presented in the special issue of Computer 1). An interesting designmethod recently introduced by De Man et al. 2) is based on the so-called'meet in the middle' strategy. This design method advocates a separation be-

.) This research was supported in part by the European Community under contractESPRITl058.

tween high-level system design and low-level (reusable) circuit design,'meeting' each other at the building block level. System design transforms ahigh-level specification to a description of the system in terms of buildingblocks. Parameterizable building blocks are synthesized by using a modulegenerator. Within this strategy, floorplanning can be viewed as a layoutprocess at the intermediate level. It combines the high-level synthesis resultswith the low-level building-block requirements into a layout.

Placement is one of the main problems within floorplanning (in additionto global and detailed routing) and can be defined as follows. Given a set ofrectangular blocks (the building blocks of a VSLI circuit) interconnected bya number of nets, then the placement problem is the problem of finding anassignment (a placement) of the blocks to 2-dimensional grid points in sucha way that no blocks overlap and the weighted sum of the area of the en-veloping rectangle and the total wire length of the nets interconnecting theblocks is minimal. Furthermore, we consider the case where the size of theblocks may vary within limits (shapeability).The placement problem is one of the hardest problems within the set of

design automation problems. It can be viewed as a generalization of thequadratic assignment problem and it therefore belongs to the class of NP-hard problems 3). Consequently it is highly unlikely that optimization algo-rithms for solving placement problems require an amount of time that canbe bounded by a polynomial in the size of the problem instance, resultingin prohibitive computation times for large problem instances. Thus one hasto resort to approximation algorithms that find near-optimal solutions withinpolynomial time. Over the years a number of such algorithms have beenproposed, most of them following a deterministic approach. A review is pre-sented in ref. 4.

More recently a number of algorithms has been proposed using simulatedannealing. Simulated annealing ') is a generally applicable randomizationtechnique based on the analogy between simulating the physical annealingprocess of solids and solving large combinatorial optimization problems. Fora detailed review the reader is referred to ref. 6. The general applicabilityand inherent flexibility of the algorithm are features that make it especiallywell suited for application to difficult problems such as the placement prob-lem.The first work in this area was reported by Jepsen and Gelatt 7). They ap-

plied the simulated annealing algorithm to the macro placement problem withfixed-sized rectangles. In refs 8 and 9, Sechen and Sangiovanni-Vincentellipresent a software package called Timber Wolf in which simulated anneal-ing is succesfully applied to placement and routing of macro/custom cells,

F.M.J. De Bont, E.H.L. Aarts, P. Meehan and C.G. O'Brien

2 Philip, Journalor Research Vol. 43 No. I 1988

Placement of shapeaóle blocks

standard cells and gate arrays. Parallel implementations of the simulated an-nealing algorithm for placement problems have been reported bya numberof authors, e.g. Banerjee and Jones: standard cell placement on a hyper-cube computer 10); Casotto and Sangiovanni- Vincentelli: macro cell place-ment 11); and Kravitz and Rutenbar: standard cell placement 12).

In this paper we discuss the application of the simulated annealing algo-rithm to placement of shapeable blocks. Novel features are the use of theadditional degree of freedom given by the shapeability, an improved mech-anism for calculating most differences between successive configurations, andthe development of a highly parallel simulated annealing algorithm based onthe clustered simulated annealing algorithm introduced by Aarts et al. 13).The application of simulated annealing to placement problems involves anumber of parameters related to the choice of the cost function and the gen-eration mechanism that is used. These parameters must be carefully chosento obtain good results. This paper strongly focuses on implementional as-pects of this kind.

The paper is organized as follows. We first discuss some general imple-mentational premisses of the simulated annealing algorithm (sec. 2). In sec.3 we define a problem representation and a cost function. The generationmechanism and corresponding calculation of differences in cost are dis-cussed in sec. 4. The sequential implementation of the algorithm is discussedin sec. 5. The parallel algorithm and its implementation are presented in sec.6. We conclude with some remarks (sec. 7).

2. Application of the simulated annealing algorithm

In applying the simulated annealing algorithm one commonly resorts toan implementation in which a sequence of Markov chains is generated at de-scending values of a control-parameter")". Markov chains are generated bycontinuously trying to transform a current configuration into a new one byapplying a generation mechanism and an acceptance criterion. Here config-urations are given by placements. Application of the simulated annealing al-gorithm requires specification of three distinct items 6):(i) a concise problem representation,(ii) a transition mechanism, and(iii) a cooling schedule.We will now elaborate on these items in more detail.(i) A concise representation of the problem consists of a configuration rep-

0) This parameter plays a role similar to that of the temperature in the physical annealing proc-ess.

Philips Journalor Research Vol. 43 No. I 1988 3

{(-/lC)_ exp --

Praccept - 1 cif se» 0if /lC.;;;O, (1)

F.M']. De Bont, E.H.L. Aarts, P. Meehan and c.o. O'Brien

resentation and an expression for the cost function. The cost functionmust be defined such that it represents the cost effectiveness of eachdifferent placement. It is desirable to express both the configurationrepresentation and the cost function by simple expressions that are easyto manipulate.

(ii) Transformation of one configuration into another involves three steps:firstly, a new configuration must be generated from a current one; sec-ondly, the difference in cost between the two configurations must becalculated; and thirdly, a decision must be made whether or not the newconfiguration is to be accepted. In our placement algorithm a new con-figuration is generated by a simple (local) rearrangement of the currentconfiguration (by a swap, displacement, reshaping of reorientation).Evaluation of the difference in the cost function must be very simplesince in practice this is the most time-consuming part of the algorithm.The decision to accept a new configuration is based on the Metropolisacceptance criterion 14), which states that the probability of accepting anew configuration is given by

where /lC denotes the difference in cost between the new and the currentconfiguration, and c is the control-parameter.(iii) Carrying out the optimization along the lines of an annealing process

requires specification of the parameters determining the cooling sched-ule. These parameters are 6): the start value Co of the control-parame-ter, a decrement function of the control-parameter, the length of theindividual Markov chains and a stop criterion. Here, we apply the cool-ing schedule introduced by Aarts and Van Laarhoven 15) (see also sec.5).

We will revisit these items in more detail in the remainder of this paper.

3. Problem representation

3.1. The configuration representation

For our purposes we consider a placement problem with M rectangularblocks and N multiterminal nets. A block i has fixed area ai and is repre-sented by the Cartesian coordinates of its lower left corner (Xil> Yil) and itsupper right corner (Xi2' Yi2) while net j is represented by the Cartesian co-

4 Phlllps Journalor Research Vol. 43 No. 1 1988

Phlllps Journalof Research Vol. 43 No. 1 1988 5

Placement of shapeable blocks

ordinates (PjI, qj1), ... , v», ajn.) of all its nj terminals. A placement is rep-resented by a configuration'whiJh is given by a four-tuple (x, y, p, q), wherex and y are 2M-vectors and pand q are K-vectors (K = ~j=Inj) given by

x, = (xll, XI2' , XMI' XM2),

Yi = (vIl' YI2' , YMI, YM2),

Pi = (PIl> , PIn!' , PNl> , PNnN)' andqi = (qll' , qln!' , qNI, , qNnN)'

(2)(3)

Furthermore each block i is given an initial height hi and width Wi' Thesevalues are used for the reshaping of the blocks (see sec. 4.1).

3.2. The cost function

According to the formulation of the placement problem given in the in-troduction, there are three criteria that determine the cost of a placement:the area of the enveloping rectangle, the total wire length, and the overlapamong blocks. In the following we discuss each of these criteria in more de-tail.

AreaLet l(e) (x) denote the envelope-width function defined as

l(e)(x) = max {Xj2} - min {Xjl},j=I,M j=I,M

(4)

Fig. 1. The envelope of a placement with 4 blocks.

(5)

F.M.]. De Bont, E.H.L. Aarts, P. Meehan and C.G. O'Brien

and similarly for the envelope-height function l(e) (y). Then the area CA ofthe rectangular envelope surrounding the blocks in a configuration is givenby

Figure 1 shows the enveloping rectangle of a placement with four blocks.

Wire lengthFor the purpose of speeding up the calculations, the total length of the

wires of a net is approximated by the perimeter of the bounding box of thenet, which is the rectangular box that can be drawn through the most ex-treme terminals of the net. In most cases this is a reasonable approximationand it has been used before by a number of authors 7-9). Figure 2 shows anexample of a bounding box of a net in a placement with three blocks.Let lr)(p) denote the net-wire width function for a net j, defined as

(6)

and similarly for the net-wire height function l}w)(q), and let çj be a measureof the number of wires in the net, e.g. the number of buslines comprisingthe net. Then the estimated total wire lenght Cw of the nets in a configu-ration is given by

N

Cw(p,q) = '2)/,v)(p) + l}'")(q) çj.j=1

(7)

r---1--,--j---------:III

3 I

CJI I

II

I----------------~Fig. 2. The bounding box of a net j in a placement of 3 blocks.

6 Ph111psJournol of Research Vol. 43 No. 1 1988

Phillps Journalof Research Vol. 43 No. 1 1988 7


Overlap

The overlap between two blocks i and j is simply taken as the area of theoverlapping rectangle (shaded area in fig. 3). Let lijO) (x) denote the- over-lap-width function for two blocks i and j, defined as

(8)

and similarly the overlap-height function for [f'j>(y). Then the totaloverlapCo in a configuration is given by

M MCo(X,Y) = L L Ifj}(x)/fj}(y),

i=lj=i+l(9)

The total cost function now is chosen as

C(x,y,p, q) = CA(x,y) + ÀwCw(p, q) + ÀoCo(x,y)/c, (10)

where Àw and Ào are weighting factors and c is the control-parameter. Theweighting factors are introduced to balance the contributions of the differ-ent terms in the cost function. To obtain near-optimal solutions the averagecontribution of the area and wire lenght to the cost function must be ap-proximately the same. An imbalance would put more emphasis on either oneof the two optimization criteria.The value of Àw is problem-dependent and must be experimentally deter-

mined and reevaluated at the start of each Markov chain. Reevaluation isrequired since the generation mechanism is chosen to be dependent on thecontrol-parameter (see sec. 4). Good results were obtained by choosing

CAÀw= 'Yw='Cw

(11)

I~)(x)(Xi2.Yi2)

j(0)

i liJy)

(Xj7.Yj7)

Fig. 3. The overlap between 2 blocks i and j.

(12)

F.M.J. De Bont, E.H.L. Aarts, P. Meehan and c.c. O'Brien

where CA and Cw denote the average values of the area and wire length overthe previous Markov chain, and 'Yw denotes a constant which is experimen-tally determined to be 0.2.The overlap in the cost function of eq. (10) is a penalty term. The factor

c-1 in the contribution of the overlap is introduced to ensure complete elim-ination of overlap in the final placement obtained by the simulated anneal-ing algorithm (as c approaches 0 the penalty for overlap becomes infinite).Furthermore it is desirable that blocks are allowed to overlap during a ma-jor part of the optimization process. Therefore the weighting factor Ao is in-troduced. The idea behind the evaluation of Ao is to use a constant factorwhich is proportional to the area and inversely proportional to the maxi-mum possible overlap in a placement. Thus a measure for Ao is given by

where Atot is simply taken as the sum of the areas of the individual blocks(Atot =~n. 1ai) and Omax is the maximum possible overlap (obtained whenall blocks are placed on top of each other with coinciding lower left cornersin their initial shape and orientation). The factor 'Yo was determined exper-imentally to be O.l.Evidently the choices for the weighting factors are quite heuristic and a

more formal theoretical derivation would be of interest. However, experi-mentally it turned out that these choices do very well in the sense that theyyield good solutions to the placement problem.

4. The transition mechanism

4.1. The generation mechanismNew configurations are generated from current configurations by simple

local rearrangements of a placement. We distinguish between the followingpossible rearrangements: reorientations, displacements, reshapings, andswaps. Each of these manipulations will now be considered in more detail.

ReorientationsA block can have eight different orientations, i.e. four by rotation, over

0,90, 180 or 270 degrees, and four by rotation (over the same set of angles)followed by a reflection in the axis through the centre of the block and par-allel with the x-axis. The orientation given by a rotation over 0 degrees cor-responds to the current orientation of a block. A reorientation is achieved

8 Philip. Journalor Research Vol. 43 No. 1 1988


by randomly choosing a block and changing its orientation into one of theseven other orientations (not including the rotation over 0 degrees), wherethe lower left corner of the new orientation coincides with the lower left cor-ner of the current.

Displacements

A displacement is carried out by translating a randomly chosen block overa horizontal distance Äx and a vertical distance Lly. The lengths of the dis-placements are randomly chosen from an interval whose bounds are deter-mined by a range limiter 7j(c), which is dependent on the control-parameter,i.e.

(13)

The range limiter is used to decrease the length of a displacement in thecourse of the optimization process. Initially large displacements are likely tobe accepted. As the optimization process continues, large displacements willbe less and less frequently accepted and therefore they might as well beeliminated from the generation mechanism. This is achieved by choosing thefollowing expression for the range limiter:

(14)

where tmin and tmax are bounds on the displacements lengths. Evidently 7j de-creases from tmax for c = Co to tmin for C ~ O. The value of tmin was chosenequal to 1 (the pitch of the grid used for solving the placement problem).The value of tmax was calculated from the following expression:

(15)

where Atot denotes the sum of the areas of the individual blocks, and 'Yt de-notes a constant which was chosen 0.5.

Philips Journolof Research Vol. 43 No. I 1988 9

(Xi2 - x, I) hi'Pil'::;; .::;; 'Pi2,

(Yi2 - Yil) Wi(16)

F.M.J. De Bont, E.H.L. Aarts, P. Meehan and c.o. O'Brien

ReshapingsA reshaping is achieved by changing the dimensions of a randomly chosen

block while leaving the area of the block unchanged. To prevent the dimen-sions of a block from becoming unrealistic for practical application, twobounds 'Pil and 'Pi2 are introduced for each block i, limiting the aspect ratioof the block. Let 0 < 'Pil .::;;'P,"'2,then the coordinates of a block i must alwayssatisfy the following inequality:

where Wi and hi are the initial width and height of block i, respectively. If'Pi I = 'Pi2 = 1 the dimensions of the block will remain unchanged. The po-sitions of the terminals at the block will be transformed in such a way thattheir relative positions on the sides of the block will remain unchanged.

SwapsA swap is carried out by interchanging the positions of two randomly cho-

sen blocks. The lower left corners are used as reference points, i.e. the blocksare swapped by interchanging the coordinates of their lower left corners,keeping the relative position of the other corners w.r.t. the lower left cornerunchanged.

4.2. The ratio between different rearrangementsThe ratios between the different types of rearrangements used in the op-

timization process have a large impact on the quality of the final placement.They should depend on the value of the control-parameter since differentrearrangements are required at different stages of the optimization process.For instance, at the beginning of the optimization process it is desirable tohave large rearrangements so as to make large steps in the configurationspace. This is achieved using 40% displacements (with a large value of therange limiter) and 20 % swaps. Reorientations and reshapings are restrictedto 20 % each. As the optimization proceeds and c approaches 0, overlap isincreasingly penalized and most rearrangements resulting in a placement withoverlapping blocks will be rejected. Swaps are rearrangements that often leadto placements with overlapping blocks. Consequently the percentage of swapsmay be decreased to 0 as the value of the control-parameter approaches 0,since these arrangements become less and less effective. Meanwhile, thepercentage of displacements is increased up to 60 % for the following rea-

10 Philip. Journalof Research Vol. 43 No. 1 1988


son. Since the range limiter is approximately tmin as c approaches 0, only smalldisplacements will occur, which are favourable at this stage of the optimi-zation process, since they allow the blocks to shuffle about gently until theysettle into a near-optimal configuration. The change in the percentages ofswaps Vs and displacements Vd is calculated using the following expressions:

'(17)

(18)

where z.1i) and ~i) are the initial percentages of displacements and swaps, re-spectively, and ~f) is the final percentage of displacements and J.i) + ~i) -

~f) = 0.The percentage of reorientations and reshapings are kept constant at 20 %

each throughout the optimization process, since changes induced by theserearrangements are required at any stage in the optimization process.

4.3. Cost difference calculationGeneration of new configurations and especially the calculation of the dif-

ferences in cost are the most time-consuming parts of the simulated anneal-ing algorithm since they have to be carried out for each transition. The ef-ficiency of these calculations may be increased if the evaluation of the costof the new configuration is performed on the basis of the cost of the currentconfiguration, i.e. if the cost difference is calculated incrementally. For theplacement problem, cost differences can be calculated incrementally. Therequired amount of work, however, depends on the type of rearrangementsand on the positions of the blocks involved.

AreaIf the position of a block is changed, the shape and thus the area of the

enveloping rectangle may change. We distinguish between two possible sit-uations:- If a block is moved to a position outside the current enveloping rectangle,

then the change in position of the enveloping rectangle is directly deter-mined by the new position of the block.

- If a block determining the position of the enveloping rectangle (it is oneof the outermost blocks in the placement) is moved to a position insidethe current enveloping rectangle, then a search of all other blocks must

Philip. Journalor Research Vol. 43 No. 1 1988 11

(20)

F.M']. De Bont, E.H.L. Aarts, P. Meehan and c.o. O'Brien

take place in order to determine which blocks gives the new position ofthe enveloping rectangle. An inward move of a block, whose current po-sition did not affect the position of the original enveloping rectangle, willnot change the position of the new enveloping rectangle.

Clearly, these situations can easily be implemented by making use of a sim-ple direct search algorithm, or by using linked lists determining the order ofeach of the coordinates of all blocks. In our implementations we used thefirst approach to calculate the incremental difference in cost of the area(/.lCA)·

Wire length

Moving a block to a new position only affects the positions of the nets thatare connected to the block. Thus we have to recalculate only the length ofthe nets to which the given block is connected in order to obtain the valueof the difference in cost of the wire length (/.lCw). In most cases this will beonly a relatively small subset of the set of all nets. Since we use the bound-ing box for approximately the wire length of a net, we can apply the sameincremental technique for recalculating the wire length as we used for re-calculating the area.

Overlap

Moving a block to a new position requires recalculation of the overlap costof that block. The difference in cost of the overlap (ilCa) is given by:

M M

se; = 2: o;- 2: 0ij'j=lj*i i= l.j*i

(19)

where O'ij and Oij are the overlap between block j and block i in the newconfiguration and in the current configuration, respectively. In this expres-sion the summation runs over a simple sum, compared to the double sum inthe expression of the totaloverlap eq. (9).The total difference in cost /.lC is simply given by the sum of the individ-

ual cost differences, i.e.

5. Sequential implementation and numerical results

The simulated annealing algorithm for the placement problem has beenimplemented on a VAX-780 computer using the programming language 'C'.

12 Phlllps Journalof Research Vol. 43 No. I 1988


As mentioned in sec. 2, we applied the cooling schedule introduced by Aartsand Van Laarhoven 15) with the following parameters:

Initial value of the control-parameter (co)

The value of Co was chosen such that an initial acceptance ratio (Xo) of80 % was achieved. Experimential results showed that the choice of Co is notcritical.

Decrement of the control-parameter (8)

The rate of the decrement in the control-parameter is controlled by thedistance-parameter 8. We used various values of 8. It was found that 8=0.5is an optimal value in the sense that larger values led to fast deteriorationof the quality of the final result due to the fast decrement in the control-parameter , whereas smaller values of 8 increase the execution time substan-tially without appreciable effect on the quality of the final results.

Length of the Markov chains (L)The theoretical basis for the choice of the length of the Markov chains is

given by the size of the neighbourhoods defined by the generation mecha-nism. This size equals the number of different configurations that can be ob-tained from a single rearrangement. For the placement problem using thegeneration mechanism described in the previous section this number equals:

M(M-1)L = 7M + +M ni c) +Mnr2

(21)

where M is the number of blocks, nd(c) is the number of different displace-ments and n, is the number of different reshapings that may be carried out.The value of L may be very large due to the large values of nd(c) (particu-larly when c is close to Co since in that case the value of range limiter n(c)is large). In our implementations we used the minimum of the chain lengthgiven by eq. (21) and a constant chain length (independent of c) given bythe following expression:

L=LoM2, (22)

where Lo is a constant. For large values of c the chain length given by eq.(22) is much smaller than the one given by eq. (21) (using Lo = 20). How-ever, numerical experiments indicated that the results obtained with bothchain lengths are of equal quality for large values of c. Clearly, smaller val-

Phlllps Journalof Research Vol. 43 No. I 1988 13

14 Philip' Journalor Research Vol. 43 No. 1 1988

F.M.J. De Bont, E.H.L. Aarts, P. Meehan and CG. O'Brien

ues of L yield a reduction of the execution time, which is favourable. Dueto the decrement in the range limiter the chain length given by eq. (21) de-creases as c decreases. At a certain value of c there is a cross-over point,where the value given by eq. (22) becomes larger than the value given byeq. (21). Thus it is favourable to use at each value of the control-parameterthe minimum of the values given by eq. (21) and eq. (22).

Stop criterionA good criterion for terminating the simulated annealing algorithm is given

by Aarts and Van Laarhoven 15). This stop criterion is based on the extrap-olation carried out for the the average value of the cost limited to area andwire length. Good results were obtained using a value of the stop-parame-ters equal to: es = 0.005. Similar as for the initial value of the control pa-rameter, the choice of the value of the stop-parameter is not very critical.

ResultsThe placement algorithm has been applied to a number of problem in-

stances ranging in size from 6 through 20 blocks and 6 through 60 nets. Fig-ure 4 shows an example of the final placement obtained for a problem in-stance with 20 blocks and 9 nets. All blocks in this example were shapeable

IFI ~I 20 15 1

1 2

I1 16 61

I 1110

1-I

9 5 17 1

12 19 I14 I 8 I

7 1 14

13 1

1811

I 13 1

L IFig. 4. Final placement of a problem instance with 20 blocks and 9 nets with shapeability.

The deviation is 4 % (see text).


I ------11,___ Di6 16

r;=r.----

9 19 312I--

18 14 I--

11I-- r 20 13 10

II- 4 I

15 IL-

17 1

8 I1 IL 2 u

Fig. 5. Final placement of a problem instance with 20 blocks and 9 nets without shapeability.The deviation is 17% (see text).

(with parameters 'P! = 0.4 and 'P2 = 2.5). The deviation, defined as the dif-ference between the area of the enveloping rectangle and the sum of the areasof the individual blocks, is only 4 %.

To demonstrate the effect of reshaping, the placement algorithm was ap-plied using fixed-size blocks ('P! = 'P2 = 1). The deviation of the final place-ment obtained in this way was 17 % (see fig. 5).

Figure 6 shows the final cost of area and wire length as a function of thetotal number of iterations (= number of Markov chains x the Markov chainlength) for the problem instance with 20 blocks and 9 nets. We observe thatthe quality of the final result gradually improves as the number of transitions(or the Markov chain length) increases. For this problem instance the av-erage time taken to compute a single iteration is 5 milli-seconds, resultingin a total CPU time of one hour for the result shown in fig. 4.

Similar results have been obtained for other problem instances, showingthat the placement algorithm perform rather well, and that shapeability is apowerful degree of freedom in finding placements with minimum area.

6. The parallel placement algorithm

In addition to the sequential placement algorithm described in the pre-vious sections, we also designed a parallel algorithm. Our parallel algorithm

PhIllps Journalof Research Vol. 43 No. I 1988 15

F.M.!. De Bont, E.H.L. Aarts, P. Meehan and C.G. O'Brien

is based on the clustered simulated annealing algorithm introduced by Aartset al. 13). Here we restrict ourselves to a summary of the most important as-pects of the clustered algorithm.

6.1. The clustered simulated annealinlf algorithmThe basic idea underlying the clustered algorithm is to use all available

processors in a multiprocessor for the generation of one Markov chain. Fromthe sequential simulated annealing algorithm we recall that in the initial stageof the optimization process approximately all new configurations are ac-cepted, i.e. the acceptance ratio X(c) = 1. As the optimization process con-tinues, fewer new configurations are accepted, the acceptance ratio drops andconsequently the number of updates decreases. In the clustered algorithm aMarkov chain is generated by using all available processors in such a waythat rearrangements, cost-difference calculations and acceptance decisionsare executed in parallel. Updates however, can only be executed sequen-tially and while an update is being executed all other processors are halteduntil they are adapted to the new configuration. To achieve this the proce-dures WAIT(s) and SIGNAL(s) are introduced acting on the same sema-phore s. This is schematically depicted in pseudo Pascal in diagram 1.

6600cost

t 6200

5800

5400

5000

2Dmacros 9nets

4600~~~~--~~~--~~~--~~ roS ro6 ~- #= iter

Fig. 6. The cost of the final placement as a function of the number of iterations· (or thelength L of the Markov chains) in the annealing process.



DIAGRAM 1.The Markov chain generation routine in pseudo Pascal.

PROCEDURE MARKOV CHAIN GENERATION;

beginfor k: = 1 to 1 dobegin

WAIT(s);SIGNAL(s);start:= config. i;GENERATE(config. i --,) config. j, f:.Cij);

if f:.Cij:S;; 0 then accept elseif exp(-f:.Ci/c»random[O,l] then accept;

if accept thenbegin

WAIT(s);if start = config. i then UPDATE(config. j);SIGNAL(s);

, :

end;end;

end.

The sequence of calls to the procedures WAIT(s) and SIGNAL(s) at thebeginning of each step in a chain prohibits processors from starting the gen-eration of new configurations while another processor is updating. Proces-sors that have already started generating a new configuration are reset.Clearly generating Markov chains as described above provides a satisfactoryspeedup only in the lower c region. To increase the efficiency as in the upperc region we use the concept of sub-chains. For large c-values it is much morecost-effective to divide a Markov chain into a number of sub-chains and tohave each of the available processors working on its own sub-chains. In thisthe activities of the processors do not interfere and a linear speedup can beachieved. This so-called division algorithm can be described as follows 13).Let K be the number of processors, L the length of a Markov chain number,1= [LIK] the length of a sub-chain, m the sub-chain number and n the Mar-kov chain number. The the nth Markov chain is generated by letting eachprocessor generate a sub-chain of length I. All sub-chains are generated atthe same value of the control-parameter. The initial configuration of all sub-chains is identical. It is chosen probabilistically from the K final configura-

Phllips Journalof Research Vol. 43 No. 1 1988 17

F.M.!. De Bont, E.H.L. Aarts, P. Meehan and CiG, O'Brien

tions obtained by generating the (n - 1yhMarkov chain. For the decrementof the control-parameter we use a similar decrement function as in the se-quential implementation 15).We now return to the clustered algorithm. To increase the efficiency in

the higher c region of the clustered algorithm we include the division algo-rithm in this region. The clustered algorithm then can be described as fol-lows. Given K processors, start off with the division algorithm such that eachindividual processor evaluates its own sub-chain. At some point in the op-timization process the processors are clustered two by two and the sub-chainsare enlarged. Each enlarged sub-chain is now evaluated by a cluster of twoprocessors. This process is repeated until all processors are combined in onesingle cluster, evaluating the same sub-chain of length equal to the length ofa full Markov chain. Here the clusters are formed by doubling the numberof processors (this is not essential for the algorithm) and it is assumed thatK = 2P for some positive integer p. At the end of each Markov chain a ratiofor the efficiency of clustering is calculated; if the ratio is larger than a cer-tain value (e.g. 50 %), then a switch is made to half the number of clusters.

Such a switch consists of three actions- divide the number of clusters by 2;- multiply the length of sub-chains by 2;- distribute the K processors equally over the number of clusters.This clustered simulated annealing algorithm has been applied successfullyto the travelling salesman problem and exhibits a linear speedup 13).

6.2. Parallel implementation and numerical results

The parallel simulated annealing algorithm for placement was imple-mented in 'C' on an experimental multiprocessor system. The system is ageneral-purpose parallel machine consisting of a number of processors(presently eight), a data bus and a common memory. The processors havea Motorola 68000 microprocessor as CPU (8MHz) and 512 kbyte DRAMlocal memory. The local memory is dual-ported, i.e. it is possible to performread and write operations on the memory of other processors. The proces-sors do not have a floating point co-processor. Consequently, all floating pointoperations are performed by CPU-expensive software (400 JLS vs 10us foran elementary operation). The data bus is 32 bits wide (integers and realsin one transfer) and runs on 12MHz. The common memory consists of an8Mbyte DRAM. The parallel system is operated through a host computer(VAX-ll/780). Information exchange between the processors takes place bymeans of the common memory. Mutual exclusion is achieved by using SIG-NAL(s) and WAIT(s). The semaphore s is multi-valued and contains a TAS

18 Philip. Journat or Research Vol. 43 No. I 1988


field and a waiting queue (FIFO). An elementary action of the SIGNAL andWAIT procedures is an indivisible read-modify-write action of the TAS fieldof the semaphore for which the TAS instruction of the MC68000 is used.We mention that the average CPU time of a single iteration on our multi-processor system is about five times longer than on a VAX-780 computer.

The parallel algorithm was implemented using the same parameter settingof the cooling schedule used in the sequential implementation. The final re-sults obtained with the parallel implementation exhibit a similar quality asthe results obtained by the sequential implementation. The speedup sCK) asa function of the number of processors K is defined as the time needed byone processor divided by the time needed by K processors to solve the sameproblem, under the condition that the same quality of the solutions is ob-tained. Figure 7 shows the speedup for different values of the Markov chainlength obtained for the problem instance with 20 blocks and 9 nets. It is ob-served that for the larger Markov chains lengths the speedup is linear. Atsmaller chain lengths the efficiency decreases due to the fact that the lenghtsof the sub-chains become too short to run the division algorithm efficiently.

8.020 macros 9 nets

speedup2000010000

t 5000

6.02000

1000

500

4.0

200

100

20 so

o 6 8-K

Fig. 7. The speedup of the parallel placement algorithm as a function of the number Kofprocessors. The curves are obtained for different Markov chain lengths.

1 2

Philip. Journalof Research Vol. 43 No. I 1988 19

F.M.J. De Bont, E.H.L. Aarts, P. Meehan and e.G. O'Brien

For the clustered algorithm there is no theoretical limit to the number ofprocessors (maintaining the same efficiency). However, similar to the nu-merical experiments carried out for the travelling salesman problem 13), weanticipate a saturation of the efficiency for the placement problem too, dueto the fact that updates are performed on the global memory of the multi-processor system (bus saturation).

7. Conclusions

We have presented a sequential and parallel simulated annealing-basedplacement algorithm for shapeable blocks. The algorithm minimizes aweighted sum of the area of the rectangle enveloping the blocks and the to-tal estimated wire length. The wire length is estimated by using a boundingbox approximation. Overlap of blocks is allowed but increasingly penalizedas the optimization process continues. Generation of new configurations in-cludes reorientation, displacement and reshaping of blocks as well as the swapof two blocks. Numerical experiments show that good results can be ob-tained with our algorithm both for the sequential and for the parallel imple-mentation (using the clustered simulated annealing algorithm). Shapeabilityof blocks is found to be of great importance for obtaining good results. Thespeedup of the parallel algorithm is linear, but with our multiprocessor ar-chitecture, bus saturation will be the limiting factor for large-scale parallel-ism.

We conclude that simulated annealing is well suited to approximate 'dirty'combinatorial optimization problems such as the block placement problemtreated in this paper. Moreover, it turns out that this class of problems canbe treated very well by a parallel implementation of the simulated annealingalgorithm. This result becomes especially significant when multiprocessorsystems are used with processors having a speed comparable to the speed ofa conventional mainframe.

Finally we mention that the application of parallel annealing algorithmsto the placement problem can be pursued in a new direction by implemen-tations on a connection machine, as is proposed by Casotto and Sangiov-anni-Vincentelli 16). Another possible direction of progress emerges from theuse of special purpose optimization hardware such as the Boltzmann ma-chine which has recently been shown to be very successful for approximatinga number of combinatorial optimization problems 17.18).



REFERENCES

I) Computer, 19(1986).2) H. de Man, J. Rabaey, P. Six, and L. Cleasen, IEEE Design and Test, 12, 13 (1986).3) S. Sahni, and A. Bhatt, Proc. 17th Design Autom. Conf., Minneapolis, Minnesota, 1980,

p.402.4) J. Soukup, Proc. IEEE, 69, 1281 (1981).5) S. Kirkpatrick, C.D. Gelatt Jr. and M.P. Vecchi, Science, 220, 671 (1983).6) P. J. M. Laarhoven and E. H. L. Aarts, Simulated Annealing: Theory and Applications,

D. Reidel Publishing Company, Kluwer Academic Publishers, Dordrecht, The Nether-lands, 1987.

7) D.W. Jepsen and C.D. Gelatt Jr., Proc. Int. Conf. Comput. Design, 1983, p. 495.8) C. Sechen, and A. Sangiovanni-Vicentilli, IEEE J. Solid State Circuits, SC-20, 510

(1985).9) C. Sechen and A. Sangiovanni-Vicentelli, Proc.: 23rd Design Autom. Conf., Las

Vegas, 1986, p. 432.10) P. Banerjee and M. Jones, Proc. IEEE Int. Conf. Comput. Aided Design, Santa Clara,

1986, p. 34.11) A. Casotto and A.L. Sangiovanni-Vincentelli, Proc. IEEE Int. Conf. Comput. Aided

Design, Santa Clara, 1986, p. 30.12) S.A. Kravitz and R.A. Rutenbar, Proc. 23rd Design Autom. Conf., Las Vegas, 1986,

p.567.13) E.H.L. Aarts, F.M.J. de Bont, J.H.A. Habers and P.J.M. van Laarhoven, In-

tegration, 4, 209 (1986).14) M. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller and A. Teller, J. Chem.

Phys., 21, 1087 (1953).15) E.H.L. Aarts, and P.J.M. van Laarhoven, Philips J. Res., 40,193 (1985).16) A. Casotto, and A.L. Sangiovanni-Vincentelli, Proc. IEEE Int. Conf. Comput.-Aided

Design, Santa Clara, 1987.17) E.H.L. Aa r t s and J.H.M. Korst, Europ. J. Operational Res., submitted, 1987.18) J.H.M. Korst and E.H.L. Aarts, J. Parallel Distributed Comp., submitted, 1987.

Authors

Emile H. L. Aarts; Drs degree (Mathematics and Physics), Univer-sity of Nijmegen, The Netherlands, 1979; Ph.D. University of Gron-ingen, The Netherlands, 1983; Philips Research Laboratories, Ein-dhoven, 1983- His research interests include combinatorialoptimization and neural networks. He presently also has a consultingand teaching position at the Technical University Eindhoven.

Frans M.J. de Bont; Ing. degree (Electrical Engineering), HTS's Hertogenbosch, The Netherlands, 1984; Philips Research Laborato-ries, Eindhoven, 1984- . He is involved in the research on parallel al-gorithms and computer architectures, and parallelization of algorithmsfor multi-processor systems.

Philips Journalof Research Vol.43 No. 1 1988 21

Charlie O'Brien; B.A., B.A.!. degree (Electronic and ComputerEngineering), 1986 and M.Sc. degree (Microelectronics), 1987, TrinityCollege, University of Dublin, Ireland. Philips Research Laboratories,Eindhoven, 1987- . His thesis work was related to optimization by sim-ulated annealing and the application of the technique to automaticplacement and routing and to logic minimization. His present work isin the field of CAD for VLSI systems, on data path synthesis for audiosignal processing applications.

F.M.J. De Bont, E.H.L. Aarts, P. Meehan and CC. O'Brien

22

Peter Meehan; B.A., B.A.!. degree (Electronic and Computer En-gineering), University of Dublin, Ireland, 1987. In 1986 he co-operatedin a project involving research into VLSI design automation at PhilipsResearch Laboratories, Eindhoven. He is presently undertaking aMaster's degree in the Computer Science department of Trinity Col-lege, Dublin, his task being the development of a graphics system fora small computer.

Philips Journalof Research Vol. 43 No. I 1988

placement of shapeable blocks*) - philips bound... · example of a bounding box of anet in a...

Documents