compressing cube-connected cycles and butterfly networks

Compressing Cube-Connected Cycles andButterfly Networks

Ralf Klasing,1 Reinhard Luling,2 Burkhard Monien2

1 Department of Computer Science, University of Warwick, Coventry CV4 7AL, England

2 Department of Mathematics and Computer Science, University of Paderborn,33095 Paderborn, Germany

Received 10 April 1995; accepted 16 June 1997

Abstract: We consider the simulation of large cube-connected cycles (CCC ) and large butterfly net-works (BFN ) on smaller ones, a problem that arises when algorithms designed for an architecture of anideal size are to be executed on an existing architecture of a fixed size. We show that large CCCs andBFNs can be embedded into smaller networks of the same type with (a) dilation 2 and optimum load,(b) dilation 1 and optimum load in most cases, and (c) dilation 1 and nearly optimum load in all cases.Our results show that large CCCs and BFNs can be simulated very efficiently on smaller ones. Additionally,we implemented our algorithm for compressing CCCs and ran several experiments on a Transputernetwork, which showed that our technique also behaves very well from a practical point of view. q 1998John Wiley & Sons, Inc. Networks 32: 47–65, 1998

1. INTRODUCTION 14]) . But the problem generally neglected is that mostof the existing algorithms are designed for arbitrarily large

Over the past few years, much research has been done in networks (see, e.g., [18, 21, 22]) , whereas, in practice,the field of interconnection networks for parallel computer the processor network will be fixed and of smaller size.architectures (for a survey, cf. [12, 15, 20]) , as most of Thus, the larger network must be simulated in an efficientthese architectures can actually be realized in hardware way (i.e., needing little simulation time) on the smaller(e.g., as a network of Transputers) . Much of the work target network.has been focused on the capability of certain networks to Solutions to this problem, which is commonly modeledsimulate other network or algorithm structures, in order as a graph embedding problem, have been proposed soto execute parallel algorithms of a special structure effi- far for common network structures like hypercubes, bi-ciently on different processor networks (see, e.g., [3, nary trees, meshes, shuffle-exchange networks, and de-

Bruijn networks in [1, 2, 4–7, 9, 10, 16, 17]. So far, onlypartial results are known about two classes of networksCorrespondence to: R. Klasing; e-mail: rakedcs.warwick.ac.uk

An extended abstract of this paper was presented at the 2nd IEEE which are very important for practical purposes, namely,Symposium on Parallel and Distributed Processing (1990). the cube-connected cycles (CCC) —as introduced in

Contract grant sponsor: German Research Association (DFG); con- [18] —and the butterfly network (BFN) .tract grant numbers: Mo 285/9-1, Me 872/6-1

In [4, 9, 17], embeddings with optimum dilation andContract grant sponsor: ESPRIT Basic Research Action; contractgrant numbers: 7141 (ALCOM II) and No. 20244 (ALCOM IT). load are presented in the case of embedding CCCs and

q 1998 John Wiley & Sons, Inc. CCC 0028-3045/98/010047-19

47

8U22 826/ 8U22$$0826 06-18-98 08:55:16 netwa W: Networks

48 KLASING, LULING, AND MONIEN

BFNs of dimension l into k where kÉl . The authors also This is optimal if l /k √ {(2p 0 2)/pÉp √ {6, 7, rrr}}and very close to optimal in all other cases.restrict themselves to special kinds of embeddings of a

very regular structure, like coverings [4] , homogeneous The general strategy of the embeddings is to map 2 l0k

cycles in CCC( l) /BFN( l) of length l onto one cycle inemulations [9] , and homomorphisms [17]. Because ofthe very restricted nature, Bodlaender [4] and Peine [17] CCC(k) /BFN(k) of length k and to distribute the nodes

of the guest cycles as evenly as possible on the host cycle.were also able to classify their embeddings completely.In [2] , a general procedure was described for mapping A specification or a variation of this general idea will

yield many of the results above. But in one importantparallel algorithms into parallel architectures. This proce-dure was applied to the CCC network achieving dilation case, namely, the dilation 1 embedding of BFN( l) into

BFN(k) for l /k õ 2, this construction is not powerful1, but a very high load. Also, only special kinds of embed-dings, so-called contractions, are considered. enough. (It only yields load 2r2 l0k .) In this case, we

introduce a method that allows local rearrangement ofThis paper investigates the embedding problem forCCC and BFN taking into account general embedding nodes between different host cycles. As an effect, the

load is distributed more evenly in the corresponding partfunctions and any possible network dimension. The cen-tral statement derived is of the host network.

Our results have a major impact on many fields inLarge CCCs and BFNs can be simulated very efficiently parallel processing, as CCCs and BFNs have been gener-(almost optimally) on smaller ones. ally accepted as two benchmark architectures for multi-

computers because of their fixed degree and good routingIn more detail, we prove for the cube-connected cycles capabilities. To show the practical applicability of ournetwork that CCC( l) can be embedded into CCC(k) , l techniques, we have built a tool which allows mappingú k , with of any CCC of dimension l to a fixed CCC of dimension

k õ l . We present results for a distributed branch and(a) Dilation 2 and optimum load ( l /k)2 l0k . bound algorithm solving the vertex cover problem [13](b) Dilation 1 and optimum load, if l /k ¢ 2. and for a program which simulates an arbitrary distributed

algorithm. Using our mapping tool, many important algo-(c) Dilation 1 and optimum load, for certain values ofrithms for large CCCs and BFNs can be implementedl , k , if l /k õ 2.very efficiently on a network of realistic size, for example,(d) Dilation 1 and ‘‘nearly’’ optimum load, for all otherthe simulation of a parallel random access machinevalues of l , k , if l /k õ 2.(PRAM) on large BFNs as described in [19] can noweasily be transferred to a given network of processorsMore precisely, the load in cases (c) and (d) isconfigured as a butterfly of a fixed size.

2p 0 1p

2 l0k

2. DEFINITIONSfor p √ {2, 3, rrr} such that

2p 0 3p 0 1

õ l

k° 2p 0 1

p. (Most of the terminology is taken from [12, 15, 18,

20] .) Let, for any graph G Å (V, E ) , V (G ) Å V denotethe set of vertices of G , and E (G ) Å E denote the setThis is optimal if l /k √ {(2p 0 1)/pÉp √ {2, 3, rrr}}of edges of G . Let a

Vdenote the binary complement ofand very close to optimal in all other cases.

a √ {0, 1} .For the butterfly network, we show that BFN( l) canbe embedded into BFN(k) , lú k , with (a) – (d) as above.Here, the load in cases (c) and (d) is specified by

Networks

The (wrapped) cube-connected cycles network of dimen-2p 0 2

p2 l0k

sion m , denoted by CCC(m) , has vertex-set Vm Å {0, 1,. . . , m 0 1} 1 {0, 1}m , where {0, 1}m denotes the setfor p √ {7, 8, rrr} such thatof length-m binary strings. For each vertex £ Å ( i , a)√ Vm , i √ {0, 1, . . . , m 0 1}, a √ {0, 1}m , we call i2p 0 4

p 0 1õ l

k° 2p 0 2

p,

the level and a the position-within-level (PWL) string of£. The edges of CCC(m) are of two types: For each i√ {0, 1, . . . , m 0 1} and each a Å a0a1rrram01 √ {0,5

32 l0k for

l

k° 5

3.

1}m , the vertex ( i , a) on level i of CCC(m) is connected


CUBE-CONNECTED CYCLES AND BUTTERFLY NETWORK 49

Fig. 1. The cube-connected cycles CCC (3 ) .

• By a cycle-edge with vertex (( i / 1) mod m , a) on Lex( i , a0a1rrran01)level ( i / 1) mod m and Å i2 n / a02

n01 / a12n02 / rrr / an012

0 .• By a cross-edge with vertex ( i , a( i)) on level i .

Then, the lexicographical order on {0, 1, . . . , m 0 1}Here, a( i) Å a0rrrai01aV iai/1rrram01 . For each a √ {0, 1 {0, 1} n is specified by1}m , the cycle

( i , a) õ ( j , b) B Lex( i , a) õ Lex( j , b) ,(0, a) 0 (1, a) 0 rrr 0 (m 0 1, a) 0 (0, a)

and the lexicographical distance between ( i , a) and ( j ,b) is defined asof length m will be denoted by Ca(m) or Ca .

CCC(m) has m2m nodes, 3m2m01 edges, and degree3. An illustration of CCC(3) is shown in Figure 1. ÉLex( i , a) 0 Lex( j , b)É.

The (wrapped) butterfly network of dimension m , de-noted by BFN(m) , has vertex-set Vm Å {0, 1, . . . , m

Even Distributions0 1} 1 {0, 1}m , where {0, 1}m denotes the set of length-m binary strings. For each vertex £ Å ( i , a) √ Vm , i Let a1 , b1 , a2 , b2 √ N0 such that b1 ¢ a1 , b2 ¢ a2 , b1

√ {0, 1, . . . , m 0 1}, a √ {0, 1}m , we call i the level 0 a1 ¢ b2 0 a2 . Let r √ N. A functionand a the position-within-level (PWL) string of £. Theedges of BFN(m) are of two types: For each i √ {0, 1, d : {a1 , a1 / 1, . . . , b1} 1 {0, 1} r r. . . , m 0 1} and each a Å a0a1rrram01 √ {0, 1}m , the

{a2 , a2 / 1, . . . , b2}vertex ( i , a) on level i of BFN(m) is connected

is called an even distribution of {a1 , . . . , b1} 1 {0,• By a cycle-edge with vertex (( i / 1) mod m , a) and1} r among the nodes of {a2 , . . . , b2} according to the

• By a cross-edge with vertex (( i / 1) mod m , a( i)) lexicographical order on {a1 , . . . , b1} 1 {0, 1} r if dsatisfies the following properties:

on level (i / 1) mod m . Again, a(i) Å a0rrr

ai01aV iai/1rrram01 . For each a √ {0, 1}m , the cycle • d(a1 , 0 r) Å a2 , d(b1 , 1 r) Å b2 ,• d( i , b) ° d( i *, b *) , if ( i , b) ° ( i *, b *) according to

(0, a) 0 (1, a) 0 rrr 0 (m 0 1, a) 0 (0, a) the lexicographical order on {a1 , . . . , b1} 1 {0, 1} r ,• [(b1 0 a1 / 1)/(b2 0 a2 / 1)]r2 r 0 1 ° Éd01( j)É

of length m will be denoted by Ca(m) or Ca . ° [(b1 0 a1 / 1)/(b2 0 a2 / 1)]r2 r for all j √ {a2 ,BFN(m) has m2m nodes, m2m/1 edges, and degree 4. . . . , b2}.

An illustration of BFN(3) is shown in Figure 2. To obtaina clearer picture, level 0 has been replicated. [Note that such a distribution function d can always be

constructed for the parameters a1 , b1 , a2 , b2 , r as above.]

Lexicographical OrderingsNetwork Simulations

For many of the proofs later on, we will need the notionof lexicographical ordering. For this purpose, let the lexi- Let G and H be finite undirected graphs. An embedding

of G into H is a mapping f from the nodes of G to thecographical numbering Lex : {0, 1, . . . , m 0 1} 1 {0,1} n r N be defined as nodes of H . G is called the guest graph and H is called



Fig. 2. The butterfly graph BFN (3 ) .

the host graph of the embedding f . The dilation of the The exact distribution of the nodes of {Ca0a1rrral01Éak ,

embedding f is the maximum distance in the host between ak/1 , . . . , al01 √ {0, 1}} on Ca0a1rrrak01is determined by a

the images of adjacent guest nodes. Its load factor is thedistribution function

maximum number of vertices of the guest graph G thatare mapped to the same host graph vertex. [The optimum

d : {k , k / 1, . . . , l 0 1} 1 {0, 1} l0k rload achievable is the ratio ÉV (G)É/ÉV ( H)É of thenumber of nodes in G and H .] Its edge congestion is the {0, 1, . . . , k 0 1}maximum number of edges that are routed through a sin-gle edge of H . [A routing is a mapping r of G’s edges which specifies, for each node number √ {k, k / 1, . . . ,to paths in H , r(£1 , £2) Å a path from f (£1) to f (£2) l0 1} on the guest cycle Ca0a1rrral01

and each cycle indexin H .] akak/1rrral01, the position on the host cycle Ca0a1rrrak01

.An embedding of G into H is an abstraction of a simu-

(On each host cycle Ca0a1rrrak01, a0, a1 , . . . , ak01 √ {0, 1},

lation of G by H as an interconnection network. Thethe same distribution function is used.) Formally, thedilation and edge congestion are measures for the com-embedding f : V (CCC(l))/V ( BFN(l)) r V (CCC(k))/munication time, the load for the maximum work to beV ( BFN(k)) is of the formdone by a processor. In this paper, we focus on dilation

and load. Edge congestion will only play a minor role.f ( i , a0a1rrral01)

3. THEORETICAL RESULTS AND PROOFS

: Å

( i , a0rrrak01)

if 0 ° i ° k 0 1,

(d( i , akak/1rrral01) ,

a0rrrak01) else.

3.1. General Embedding Strategy

The basic idea of most of the embeddings presented hereis to map 2 l0k cycles Ca1

, Ca2, . . . , Ca2

l0k in CCC( l) /BFN( l) of length l onto one cycle Cb in CCC(k) /BFN(k) The load of f is determined by the distribution functionof length k and to distribute the lr2 l0k nodes of Ca1

, . . . , d . Therefore, d should distribute the guest nodes as evenlyas possible on each host cycle. All the cross-edgesCa2

l0k appropriately among the k nodes of Cb . Two differ-ent kinds of such embeddings are distinguished:

( i , a) 0 ( i , a( i)) , 0 ° i ° k 0 1,

First Construction ( i , a) 0 ( i / 1, a( i)) , 0 ° i ° k 0 2

Let a0 , a1 , . . . , ak01 √ {0, 1}. The cyclesof CCC( l) /BFN( l) are mapped onto a corresponding{Ca0a1rrral01

Éak , ak/1 , . . . , al01 √ {0, 1}} of CCC( l) /cross-edge in CCC(k) /BFN(k) . Likewise, all the cycle-BFN( l) are mapped onto the cycle Ca0a1rrrak01

inedges

CCC(k) /BFN(k) as follows: For each 0 ° i ° k 0 1,the node i of each Ca0a1rrral01

, ak , ak/1 , . . . , al01 √ {0,( i , a) 0 ( i / 1, a) , 0 ° i ° k 0 2

1} is mapped onto the node i of Ca0a1rrrak01. The nodes

k , k / 1, . . . , l 0 1 of each Ca0a1rrral01are distributed of CCC( l) /BFN( l) are mapped onto a corresponding cy-

appropriately among the nodes of Ca0a1rrrak01. cle-edge in CCC(k) /BFN(k) . All the other edges of



CCC( l) /BFN( l) are mapped onto a path on a single cycle ( i , a)0 ( i , a( i)) , i√ {p(0) , p(1) , . . . , p(k0 1)},Cb in CCC(k) /BFN(k) . So, in this case, the dilation isdirectly dependent on the distribution of the guest nodes ( i , a) 0 (( i / 1) mod l , a( i)) ,on the host cycle and stands partly in contrast to the

i √ {p(0) , p(1) , . . . , p(k 0 1)}desired evenness of the distribution as explained above.For low dilation, the nodes ( i , a0a1rrral01) and ( j ,

of CCC( l) /BFN( l) are mapped onto a path consisting ofb0b1rrrbl01) of the cycles Ca1, Ca2

, . . . , Ca2l0k of

one corresponding cross-edge in CCC(k) /BFN(k) andCCC( l) /BFN( l) with a small lexicographical distance

two (possibly empty) paths on two different cyclesbetween ( i , akak/1rrral01) and ( j , bkbk/1rrrbl01)

Cb1, Cb2

in CCC(k) /BFN(k) . All the other edges ofshould be mapped close together on the cycle Cb in

CCC( l) /BFN( l) are mapped onto a path on a single cycleCCC(k) /BFN(k) .Cb in CCC(k) /BFN(k) . In both cases, the dilation isdirectly dependent on the distribution d of the guest nodes

Second Construction on the host cycle and stands partly in contrast to thedesired evenness of the distribution as explained above.

Let p(0) , p(1) , . . . , p(k 0 1) √ {0, 1, . . . , l 0 1},For low dilation, the values of p(0) , p(1) , . . . , p(k

p(0) õ p(1) õ rrr õ p(k 0 1). Let pV(0) , p

V(1) , . . . , 0 1) should be spread relatively evenly among 0, 1, . . . ,

pV( l 0 k 0 1) √ {0, 1, . . . , l 0 1} " {p(0) , p(1) , . . . ,

l 0 1, and the nodes ( i , a0a1rrral01) and ( j ,p(k 0 1)} such that p

V(0) õ p

V(1) õ rrr õ p

V( l 0 k

b0b1rrrbl01) of the cycles Ca1, Ca2

, . . . , Ca2l0k of

0 1). [Note that {p(0) , p(1) , . . . , p(k 0 1)} <g CCC( l) /BFN( l) with a small lexicographical distance

{pV(0) , p

V(1) , . . . , p

V( l 0 k 0 1)} Å {0, 1, . . . , l 0 1}.]

between ( i , apV(0)rrrap

V(l0k01) ) and ( j , bp

V(0)rrrbp

V(l0k01) )Let aj √ {0, 1} for j √ {p(0) , p(1) , . . . , p(k 0 1)}.

should be mapped close together on the cycle Cb inThe cycles {Ca0a1rrral01

Éaj √ {0, 1} for j √ {pV(0) ,

CCC(k) /BFN(k) .pV(1) , . . . , p

V( l 0 k 0 1)}} of CCC( l) /BFN( l) are Overall, note that the first construction is a special case

mapped onto the cycle Cap(0)ap(1)rrrap(k01)in CCC(k) / of the second one by specifying

BFN(k) such that the nodes 0, 1, . . . , l 0 1 of eachCa0a1rrral01

are distributed appropriately among the nodes p( i) Å i , 0 ° i ° k 0 1,of Cap(0)ap(1)rrrap(k01)

.d2( i , b0b1rrrbl0k01)The exact distribution of the nodes of {Ca0a1rrral01

Éaj

√ {0, 1} for j √ {pV(0) , p

V(1) , . . . , p

V( l 0 k 0 1)}} on

Cap(0)ap(1)rrrap(k01)is determined by a distribution function Å H i if 0 ° i ° k 0 1,

d1( i , b0b1rrrbl0k01) if k ° i ° l 0 1,

d : {0, 1, . . . , l 0 1} 1 {0, 1} l0k r

where d1 and d2 denote the distribution d in the first and{0, 1, . . . , k 0 1} the second constructions, respectively.

which specifies, for each node number √ {0, 1, . . . , l 3.2. Dilation 2 Embedding of the CCC0 1} on the guest cycle Ca0a1rrral01and each cycle index and the BFN

apV(0)ap

V(1)rrrap

V(l0k01) , the position on the host cycle

Cap(0)ap(1)rrrap(k01). [On each host cycle Cap(0)ap(1)rrrap(k01)

,Theorem 1. Let k, l be positive integers, l ú k.ap(0) , ap(1) , . . . , ap(k01) √ {0, 1}, the same distribution

function is used.] Formally, the embedding f :1. There is a dilation 2 embedding of BFN( l) intoV (CCC( l)) /V ( BFN( l)) r V (CCC(k)) /V ( BFN(k)) is

CCC(k) with optimum load ( l /k)2 l0k .of the form2. There is a dilation 2 embedding of CCC( l) into

CCC(k) with optimum load ( l /k)2 l0k .f ( i , a0a1rrral01)

3. There is a dilation 2 embedding of BFN( l) intoBFN(k) with optimum load ( l /k)2 l0k .: Å (d( i , ap

V(0)rrrap

V(l0k01) ) , ap(0)rrrap(k01) ) .

Proof.Claim 1. BFN( l) can be embedded into CCC(k) withAgain, the load of f is determined by the distribution

function d . Therefore, d should distribute the guest nodes dilation 2 and optimum load by specifying d and p forthe embedding f in the second construction of Sectionas evenly as possible on each host cycle. All the cross-

edges 3.1 as follows:



Choose d as an even distribution of {0, 1, . . . , l 0 1} stated between these two image nodes [case distinc-tion due to (*)] :1 {0, 1} l0k among the nodes of {0, 1, . . . , k 0 1}

according to the lexicographical order on {0, 1, . . . , l0 1} 1 {0, 1} l0k . As l /k ú 1, Éd01( j)É ¢ 2 l0k for all 1. d( i , ap

V(0)rrrap

V(l0k01) ) Å m :

j Å 0, 1, . . . , k 0 1, and we can choose p(0) , p(1) , . . . ,p(k 0 1) √ {0, 1, . . . , l 0 1}, p(0) õ p(1) õ rrr (d( i , ap

V(0)rrrap

V(l0k01) ) , ap(0)rrrap(k01) )

Å mõ p(k 0 1) such that

d(p( i) , 1 l0k) Å i for all 0 ° i ° k 0 1. (*)f (1 step over a cross-edge)

[This ensures that i 0 1 ° d(p( i) , b) ° i for all i √ {1,(d( i , ap

V(0)rrrap

V(l0k01) ) ,2, . . . , k 0 1}, b √ {0, 1} l0k .]

Now, the embedding f of BFN( l) into CCC(k) is ap(0)rrrap(m01)aV p(m )ap(m/1)rrrap(k01) )defined as in the second construction of Section 3.1. LetpV(0) , p

V(1) , . . . , p

V( l 0 k 0 1) √ {0, 1, . . . , l

f (1 or no step on the cycle)0 1} " {p(0) , p(1) , . . . , p(k 0 1)} such that pV(0)

õ pV(1) õ rrr õ p

V( l 0 k 0 1). Then,

(d(( i / 1) mod l , apV(0)rrrap

V(l0k01) ) ,

f ( i , a0a1rrral01) ap(0)rrrap(m01)aV p(m )ap(m/1)rrrap(k01) ) .

: Å (d( i , apV(0)rrrap

V(l0k01) ) , ap(0)rrrap(k01) ) .

2. d( i , apV(0)rrrap

V(l0k01) ) Å m 0 1:

As the distribution d is even, f has optimum load ( l / (d( i , apV(0)rrrap

V(l0k01) ) , ap(0)rrrap(k01) )

Å m 0 1k)2 l0k . (The nodes of 2 l0k cycles of length l are distrib-uted evenly on a cycle of length k .) To prove dilation 2for f , the following cases have to be distinguished withregard to the edges in BFN( l) : f (1 step on the cycle)

Let 0 ° i ° l 0 1, a Å a0a1rrral01 √ {0, 1} l . Leta( i) Å a0rrrai01aV iai/1rrral01 . (m , ap(0)rrrap(k01) )

(a) ( i , a) 0 (( i / 1) mod l , a) : f (1 step over a cross-edge)The two vertices are mapped onto the same or ontoadjacent nodes of the same cycle in CCC(k) ; hence,

(m , ap(0)rrrap(m01)aV p(m )ap(m/1)rrrap(k01) )dilation 1.

(b) ( i , a) 0 (( i / 1) mod l , a( i)) , i /√{p(0) , . . . , p(k0 1)}: Å

from (*): m Å d( i , 1 l0k)

° d(( i / 1) mod l , apV(0)rrrap

V(l0k01) )

° d( i , apV(0)rrrap

V(l0k01) ) / 1 Å m

The two vertices are mapped onto nodes of the samecycle in CCC(k) with maximum distance 2; hence,dilation 2.


V(l0k01) ) ,(c) ( i , a) 0 (( i / 1) mod l , a( i)) , i Å p(m) , 0 ° m

° k 0 1:ap(0)rrrap(m01)aV p(m )ap(m/1)rrrap(k01) ) .f maps ( i , a) Å ( i , a0a1rrral01) onto

This completes the proof of Claim 1.(d( i , ap

V(0)rrrap

V(l0k01) ) , ap(0)rrrap(k01) )

Claims 2 and 3. As CCC(n) is a subgraph of BFN(n)[9] , there is also a dilation 1 embedding of CCC(n) into

and (( i / 1) mod l , a( i)) ontoBFN(n) . Hence, an embedding of CCC( l) into CCC(k)with dilation 2 and optimum load is obtained by first


V(l0k01) ) , embedding CCC( l) into BFN( l) and then BFN( l) into

CCC(k) . An embedding of BFN( l) into BFN(k) withap(0)rrrap(m01)aV p(m )ap(m/1)rrrap(k01) ) . dilation 2 and optimum load can be derived analogously

by first embedding BFN( l) into CCC(k) and thenCCC(k) into BFN(k) .Now, a path in CCC(k) of maximum length 2 is



(a) ( i , a) 0 (( i / 1) mod l , a) :Dilation 1 Embedding of the CCCAs l /k ¢ 2, the two vertices are mapped onto thesame or onto adjacent nodes of the same cycle in

Theorem 2. Let k, l be positive integers, l ú k. There is CCC(k) ; hence, dilation 1.a dilation 1 embedding of CCC( l) into CCC(k) with load (b) ( i , a) 0 ( i , a( i)) , k ° i ° l 0 1:

As l /k ¢ 2, the two vertices are mapped onto thesame or onto adjacent nodes of the same cycle inCCC(k) ; hence, dilation 1.

(c) ( i , a) 0 ( i , a( i)) , 0 ° i ° k 0 1:

l

k2 l0k for

l

k¢ 2,

2p 0 1p

2 l0k for p √ {2, 3, rrr}

such that2p 0 3p 0 1

õ l

k° 2p 0 1

p.

f maps ( i , a) Å ( i , a0a1rrral01) onto

( i , a0rrrak01)

and ( i , a( i)) ontoProof.

( i , a0rrrai01aV iai/1rrrak01) .(A)

l

k¢ 2:

There is an edge between these two image nodes inCCC(k) .

Each of the two constructions of Section 3.1 can beadapted to yield an embedding of CCC( l) into CCC(k)

Second Construction (cf. proof of Theorem 1): For thewith dilation 1 and optimum load. It can be shown thatembedding f in the second construction of Section 3.1,the edge congestionthe distribution d and the indices p( i) are specified asfollows:• Of the first construction is at least 2r2 l0k and at most

Choose d as an even distribution of {0, 1, . . . , l 0 1}522

l0k . 1 {0, 1} l0k among the nodes of {0, 1, . . . , k 0 1}• Of the second construction is at least 2 l0k and at most according to the lexicographical order on {0, 1, . . . , l

0 1} 1 {0, 1} l0k . As l /k ¢ 2, Éd01( j)É ¢ 2r2 l0k 0 1322

l0k .for all j Å 0, 1, . . . , k 0 1, and we can choose p(0) ,p(1) , . . . , p(k 0 1) √ {0, 1, . . . , l 0 1}, p(0) õ p(1)Therefore, the second embedding should be preferred. õ rrr õ p(k 0 1) such thatFirst Construction: For the embedding f in the first

construction of Section 3.1, the distribution d is specifiedd(p( i) , b) Å i for all 0 ° i ° k 0 1, b √ {0, 1} l0k .as follows: Choose d as an even distribution of {k , k / 1,

. . . , l 0 1} 1 {0, 1} l0k among the nodes of {0, 1, . . . , (*)k 0 1} according to the lexicographical order on {k , k/ 1, . . . , l 0 1} 1 {0, 1} l0k . Now, the embedding f of CCC( l) into CCC(k) is defined

Now, the embedding f of CCC( l) into CCC(k) is as in the second construction of Section 3.1. Let pV(0) ,

defined as in the first construction of Section 3.1: pV(1) , . . . , p

V( l 0 k 0 1) √ {0, 1, . . . , l 0 1} " {p(0) ,

p(1) , . . . , p(k 0 1)} such that pV(0) õ p

V(1) õ rrr

f ( i , a0a1rrral01) õ pV( l 0 k 0 1). Then,

f ( i , a0a1rrral01)

: Å

( i , a0rrrak01)

if 0 ° i ° k 0 1,

(d( i , akak/1rrral01) , a0rrrak01)

else.


V(l0k01) ) , ap(0)rrrap(k01) ) .

As the distribution d is even, f has optimum load ( l /k)2 l0k . (The nodes of 2 l0k cycles of length l are distrib-uted evenly on a cycle of length k .) To prove dilation 1As the distribution d is even, f has optimum load ( l /for f , the following cases have to be distinguished withk)2 l0k . (The nodes of 2 l0k cycles of length l are distrib-regard to the edges in CCC( l) :uted evenly on a cycle of length k .) To prove dilation 1

Let 0 ° i ° l 0 1, a Å a0a1rrral01 √ {0, 1} l . Letfor f , the following cases have to be distinguished witha( i) Å a0rrrai01aV iai/1rrral01 .regard to the edges in CCC( l) :

Let 0 ° i ° l 0 1, a Å a0a1rrral01 √ {0, 1} l . Let(a) ( i , a) 0 (( i / 1) mod l , a) :a( i) Å a0rrrai01aV iai/1rrral01 .



The two vertices are mapped onto the same or onto Let d denote a distribution of the elements of {0, 1, . . . ,l 0 1} 1 {0, 1} l0k among 0, 1, . . . , k 0 1 according toadjacent nodes of the same cycle in CCC(k) ; hence,

dilation 1. the lexicographical order on {0, 1, . . . , l 0 1} 1 {0,1} l0k , that is, d : {0, 1, . . . , l 0 1} 1 {0, 1} l0k r {0,(b) ( i , a) 0 ( i , a( i)) , i /√{p(0) , . . . , p(k 0 1)}:1, . . . , k 0 1} such thatThe two vertices are mapped onto the same or onto

adjacent nodes of the same cycle in CCC(k) ; hence,• d(0, 0 l0k) Å 0, d( l 0 1, 1 l0k) Å k 0 1,dilation 1.• d( i , b) ° d( i *, b *) , if ( i , b) ° ( i *, b *) according(c) ( i , a) 0 ( i , a( i)) , i Å p(m) , 0 ° m ° k 0 1:

to the lexicographical order on {0, 1, . . . , l 0 1}f maps ( i , a) Å ( i , a0a1rrral01) onto1 {0, 1} l0k ,

(d( i , apV(0)rrrap

V(l0k01) ) , ap(0)rrrap(k01) )

satisfying the additional properties

and ( i , a( i)) onto• Éd01( j)É ° [(2p 0 1)/p]2 l0k for all j Å 0, 1, . . . ,

k 0 1, ( i.e., d is ‘‘almost’’ even)(d( i , apV(0)rrrap

V(l0k01) ) ,

ap(0)rrrap(m01)aV p(m )ap(m/1)rrrap(k01) ) . • d(p( i) , b) Å i for all 0 ° i ° k 0 1, b √ {0, 1} l0k .(*)

From (*),

The distribution d can be constructed as follows:d( i , ap

V(0)rrrap

V(l0k01) ) Note that

Å d(p(m) , apV(0)rrrap

V(l0k01) ) Å m .

1. p(0) Å 0, p(k 0 1) Å l 0 1,

2. 1 ° p( i / 1) 0 p( i) ° 2 for all 0 ° i ° k 0 2,Hence, there is an edge in CCC(k) between the two3. p(i / p) 0 p(i) ° 2p 0 1 for all 0 ° i ° k 0 p 0 1image nodes

(see Claims 1–3 below). Let 0 ° i1 õ i2 ° k 0 1 such(d( i , apV(0)rrrap

V(l0k01) ) , ap(0)rrrap(k01) )

that

and• ( i1 Å 0) or ( i1 ú 0 Ú p( i1) 0 p( i1 0 1) Å 1),

• (i2 Å k 0 1) or (i2 õ k 0 1 Ú p(i2 / 1) 0 p(i2) Å 1),(d( i , apV(0)rrrap

V(l0k01) ) ,

• p( i / 1) 0 p( i) Å 2 for all i1 ° i õ i2 .ap(0)rrrap(m01)aV p(m )ap(m/1)rrrap(k01) ) .

Then, d Å d( i , b) is constructed for i1 ° i ° i2 as an(B)

2p 0 3p 0 1

õ l

k° 2p 0 1

pfor p √ {2, 3, rrr}: even distribution of the elements of {p( i1) , p( i1) / 1,

. . . , p( i2)} 1 {0, 1} l0k among i1 , i1 / 1, . . . , i2 ac-cording to the lexicographical order on {0, 1, . . . , l 0 1}

CCC( l) can be embedded into CCC(k) with dilation 1 {0, 1} l0k .1 and load [(2p 0 1)/p]2 l0k by specifying d and p Let 0 ° i1 õ i2 ° k 0 1 as above. Let s Å i2 0 i1for the embedding f in the second construction of Section / 1. Then,3.1 as described below.

Note that the first construction of Section 3.1 does not• s ° p (according to property 3. of p) and

yield dilation 1 when lõ 2k 0 2. The second construction• p( i2) 0 p( i1) / 1 Å 2s 0 1 (according to the choicestill works for dilation 1, but an optimum load cannot be

of i1 , i2) .guaranteed any longer. The load can only be balanced incertain sections of each cycle Cb in CCC(k) . The aim is

Hence, d distributes (2s 0 1)r2 l0k nodes evenly onto sto make these sections as large as possible and almostnodes, thus yielding loadequally long. To achieve this, the values of p(0) , . . . ,

p(k 0 1) must be spread evenly among 0, 1, . . . , l 0 1.Let p(0) , p(1) , . . . , p(k 0 1) be defined by 2s 0 1

sr2 l0k ° 2p 0 1

pr2 l0k

p( i) :Å il

k.

for the nodes i1 , i1 / 1, . . . , i2 . Also, according to



Claim 4 below, d(p( i) , b) Å i for all i1 ° i ° i2 , b Claim 4. d(p( i) , b) Å i for all i1 ° i ° i2 , b√ {0, 1} l0k .√ {0, 1} l0k .

Now, the embedding f of CCC( l) into CCC(k) is It suffices to show thatdefined as in the second construction of Section 3.1. LetpV(0) , p

V(1) , . . . , p

V( l 0 k 0 1) √ {0, 1, . . . , l (a) d(p( i) , 0 l0k) ¢ i for all i1 ° i ° i2 ,

0 1} " {p(0) , p(1) , . . . , p(k 0 1)} such that pV(0) (b) d(p( i) , 1 l0k) ° i for all i1 ° i ° i2 .

õ pV(1) õ rrr õ p

V( l 0 k 0 1). Then,

Proof of (a) . The number of nodes before (p( i) , 0 l0k)[ i.e., between (p( i1) , 0 l0k) and (p( i 0 1), 1 l0k)] in lexi-

f ( i , a0a1rrral01) cographical order is (p( i) 0 p( i1))r2 l0k . If this numberexceeds the capacity ( i 0 i1)r[(2s 0 1)/s]r2 l0k of the


V(l0k01) ) , ap(0)rrrap(k01) ) . nodes before i [ i.e., between i1 and i 0 1], then (a)

follows. Hence, we have to check thatFrom the discussion above, it follows that f has load[(2p 0 1)/p]2 l0k . Using property (*) , dilation 1 for

(p( i) 0 p( i1))r2 l0k ¢ ( i 0 i1)r2s 0 1

sr2 l0k .f is proved exactly as in the second construction of (A).

j(1)

Claim 1. p(0) Å 0, p(k 0 1) Å l 0 1.If i Å i1 , then (1) is clearly true. If i1 õ i ° i2 , then

p(0) Å 0r l

kÅ 0,

p( i) 0 p( i1)i 0 i1

r2 l0k Å 2r( i 0 i1)i 0 i1

r2 l0k Å 2r2 l0k

p(k 0 1) Å (k 0 1)r l

kÅ l 0 l

k Å 2r2 l0k ¢ 2s 0 1s

r2 l0k ,

Å l 0 1 Sas 1 õ l

kõ 2D . and (1) follows.

Proof of (b) . The number of nodes after (p( i) , 1 l0k)[ i.e., between (p( i / 1), 0 l0k) and (p( i2) , 1 l0k)] in lexi-Claim 2. 1 ° p( i / 1) 0 p( i) ° 2 for all 0 ° icographical order is (p( i2) 0 p( i))r2 l0k . If this number° k 0 2.exceeds the capacity ( i2 0 i)r[(2s 0 1)/s]r2 l0k of thenodes after i [ i.e., between i/ 1 and i2] , then (b) follows.

p( i / 1) 0 p( i) Å ( i / 1) l

k0 il

k Hence, we have to check that

(p( i2) 0 p( i))r2 l0k ¢ ( i2 0 i)r2s 0 1

sr2 l0k .° ( i / 1) l

k0 il

kÅ l

kÅ 2,

(2)

p( i / 1) 0 p( i) Å ( i / 1) l

k0 il

k If i Å i2 , then (1) is clearly true. If i1 ° i õ i2 , then

¢ ( i / 1) l

k0 il

kÅ l

kÅ 1.

p( i2) 0 p( i)i2 0 i

r2 l0k Å 2r( i2 0 i)i2 0 i

r2 l0k Å 2r2 l0k

Claim 3. p( i / p) 0 p( i) ° 2p 0 1 for all 0 ° i ° k Å 2r2 l0k ¢ 2s 0 1s

r2 l0k ,0 p 0 1.

and (2) follows. jp( i / p) 0 p( i)

Remark. It is clear that the above construction yieldsÅ ( i / p) l

k0 il

k° ( i / p) l

k0 il

kdilation 1 and optimum load if

l

kÅ 2p 0 1

pfor some p √ {2, 3, rrr}.Å p

l

k° p

2p 0 1p

Å 2p 0 1.



It can be shown that the only other cases with dilation 1 ( i , a) 0 (( i / 1) mod l , a( i)) , k / 1 ° i ° l 0 1and optimum load for this construction are

might be stretched to length 2, because the lexicograph-l Å k / 1, ical distance between ( i , akak/1rrral01) and (( i / 1)

mod l , akak/1rrraV irrral01) can be up to 5

4r2 l0k (for iÅ k / 1). So, if we distribute the nodes on each cycle( l , k) √ {(7, 5) , (8, 6) , (9, 7) , (10, 7) , (13, 9)}.in BFN(k) according to the lexicographical order, eachnode of BFN(k) must have capacity more than 9

4r2 l0k inTherefore, the smallest nonoptimal pairs ( l , k) with this order to guarantee dilation 1.construction are (8, 5) , (11, 7) , (12, 7) , (10, 8) , (11, The problem for 2 ° l /k õ 9

4 can be overcome by8), and (13, 8) . using a slightly different distribution d for the nodes on

each host cycle. The idea is to distribute the elements inthe first and the second half of each cycle in a differentway. Before distributing the nodes evenly in lexicograph-3.4 Dilation 1 Embedding of the BFNical order, the crucial bit ai of each node ( i , a) Å ( i ,a0a1rrral01) , k / 1 ° i ° l 0 1 is shifted toward theend of the string in order to reduce the lexicographical

Theorem 3. Let k, l be positive integers, l ú k. There is distance between ( i , akak/1rrral01) and ( i / 1,a dilation 1 embedding of BFN( l) into BFN(k) with load akak/1rrra

V irrral01) , k / 1 ° i ° l 0 2. This can bedone by reversing the part ak/1ak/2rrral01 in the firsthalf of the cycle in BFN(k) [ i.e., if k ° i ° ( l / k) /2 0 1]. In the second half [ i.e., if ( l / k) /2 ° i° l 0 1], no change is needed. As we will see later on,for the edges

l

k2 l0k for

l

k¢ 2 ,

2p 0 2p

2 l0k for p √ {7, 8,rrr}

such that2p 0 4p 0 1

õ l

k° 2p 0 2

p,

53

2 l0k forl

k° 5

3.

( i , a) 0 ( i / 1, a) , ( i , a) 0 ( i / 1, a( i))

for i Å l / k

20 1

Proof.in the middle of the cycle, it is important to leave thehighest bit ak in its original position and only to reverse

(A)l

k¢ 2: the remaining part ak/1ak/2rrral01 in the first half of the

cycle. Also, to guarantee dilation 1 for these edges, thenodes must be aligned in a proper way in the middle of

BFN( l) can be embedded into BFN(k) with dilation each cycle in BFN(k) [cf. property (*) of dU below].1 and optimum load by specifying d for the embedding Formally, let dU : {k , k / 1, . . . , l 0 1} 1 {0, 1} l0k

f in the first construction of Section 3.1 as described r {0, 1, . . . , k 0 1} be an even distribution of thebelow. elements of {k , k / 1, . . . , l 0 1} 1 {0, 1} l0k among

Note that the same embeddings as in the proof of Theo- 0, 1, . . . , k 0 1 according to the lexicographical orderrem 2 for the CCC network do not work as well for on {k , k / 1, . . . , l 0 1} 1 {0, 1} l0k , satisfying thethe BFN . The second embedding only yields dilation 2, additional propertybecause the cross-edges

ÉdU 01( j)É Å ÉdU 01(k 0 1 0 j)É

( i , a) 0 (( i / 1) mod l , a( i)) ,

i √ {p(0) , p(1) , . . . , p(k 0 1)} for all j Å 0, 1, . . . ,k

20 1. (*)

might be stretched to length 2. The first embedding onlyachieves dilation 1 and optimum load for l /k ¢ 9

4. If 2 (This aligns the elements in the middle of each cycle.)Modify dU to the distribution d as follows:° l /k õ 9

4, then the cross-edges



and (( i / 1) mod l , a( i)) onto

(( i / 1) mod l , a0rrrai01aV iai/1rrrak01) .

There is a cross-edge between these two image nodesd( i , akak/1rrral01) :Å

dU ( i , akal01al02rrrak/1)

if k° i° l / k

20 1,

dU ( i , akak/1rrral01)

ifl / k

2° i° l 0 1.

in BFN(k) .

2. i Å k 0 1:

( i , a) Å (k 0 1, a0a1rrral01) is mapped onto

(The elements in the first and the second half of each (k 0 1, a0rrrak01)cycle are distributed in a different way.) Now, definethe embedding f of BFN( l) into BFN(k) as in the first and (( i / 1) mod l , a( i)) Å (k , a0rrrak02aV k01construction of Section 3.1: akrrral01) onto

f ( i , a0a1rrral01)(d(k , akrrral01) , a0rrra

V k01)

Å (dU (k , akal01rrrak/1) , a0rrraV k01)

Å (0, a0rrraV k01) .

: Å

( i , a0rrrak01)

if 0 ° i ° k 0 1,

(d( i , akak/1rrral01) , a0rrrak01)

else.There is a cross-edge between these two image nodesin BFN(k) .

3. i Å k :As the distribution d is even, f has optimum load ( l /k)2 l0k . (The nodes of 2 l0k cycles of length l are distrib- ( i , a) is mapped ontouted evenly on a cycle of length k .) To prove dilation 1for f , the following cases have to be distinguished with (d(k , akrrral01) , a0rrrak01)regard to the edges in BFN( l) :

Å (dU (k , akal01rrrak/1) , a0rrrak01)Let 0 ° i ° l 0 1, a Å a0a1rrral01 √ {0, 1} l . Leta( i) Å a0rrrai01aV iai/1rrral01 .

and (( i / 1) mod l , a( i)) Å (k / 1, a0rrrak01

(a) ( i , a) 0 (( i / 1) mod l , a) : aV kak/1rrral01) onto

As l /k ¢ 2, the two vertices are mapped onto thesame or onto adjacent nodes of the same cycle in (d(k / 1, a

V kak/1rrral01) , a0rrrak01)BFN(k) , if i x ( l / k) /2 0 1.

Å (dU (k / 1, g) , a0rrrak01) , g √ {0, 1} l0k .For iÅ ( l/ k) /20 1, ( i , a)Å ( i , a0a1rrral01)is mapped onto

As dU (k , akal01rrrak/1) Å 0 and 0 ° dU (k / 1, g)° 1, the distance between the two image nodes is atSdS l / k

20 1, akak/1rrral01D , a0rrrak01D most 1 on the cycle Ca0rrrak01

.

4. kõ iõ l / k

20 1:

and (( i / 1) mod l , a) onto

If l Å 2k , then f ( i , a0a1rrral01) Å ( i mod k ,a0rrrak01) , and the two vertices are mapped ontoSdS l / k

2, akak/1rrral01D , a0rrrak01D . adjacent nodes of the same cycle.

Let l ú 2k . ( i , a) is mapped onto

The claim below shows that these two image nodes (d( i , akrrral01) , a0rrrak01)have at most distance 1 on the cycle Ca0rrrak01

.Å (dU ( i , akal01rrrak/1) , a0rrrak01)

(b) ( i , a) 0 (( i / 1) mod l , a( i)) :and (( i / 1) mod l , a( i)) onto

1. 0 ° i õ k 0 1:

f maps ( i , a) Å ( i , a0a1rrral01) onto (d( i / 1, akrrrai01aV iai/1rrral01) , a0rrrak01)

Å (dU (i/ 1, akal01rrrai/1aV iai01rrrak/1), a0rrrak01).( i , a0rrrak01)



The lexicographical distance between 2 l0k / 2 l010 i ° 2 l0k / 2 l010 ( l/k ) /2

( i , akal01rrrak/1) andÅ 2 l0k / 2 ( l0k ) /201 Å S1 / S1

2D ( l0k ) /2/1D2 l0k

( i / 1, akal01rrrai/1aV iai01rrrak/1)

is at most° S1 / S1

2D( l0k ) /2D2 l0k °* S1 / S1

2D(k/1) /2D2 l0k

2 l0k / 2 i0 (k/1) ° 2 l0k / 2 ( l/k ) /2020 (k/1)

Å 2 l0k / 2 ( l0k ) /203 Å S1 / S12D

( l0k ) /2/3D2 l0k° S1 / 1

kD2 l0k °*l 0 k

k2 l0k ° l 0 k

k2 l0k .

[*l ¢ 2k / 1]° S1 / S12D

( l0k ) /2D2 l0k °* S1 / S12D

(k/1) /2D2 l0k

Hence, the distance between the two image nodes isat most 1 on the cycle. [The capacity of each node inBFN(k) is sufficient to keep all the nodes between ( i ,a) and ( i / 1, a( i)) .]

(Cf. 2.)° S1 / 1

kD2 l0k °*l 0 k

k2 l0k ° l 0 k

k2 l0k .

7. i Å l 0 1:

( i , a) Å ( l 0 1, a0a1rrral01) is mapped onto[*l ¢ 2k / 1]Hence, the distance between the two image nodes is

(d( l 0 1, akrrral01) , a0rrrak01)at most 1 on the cycle. [The capacity of each node inÅ (dU ( l 0 1, akak/1rrral01) , a0rrrak01)BFN(k) is sufficient to keep all the nodes between ( i ,

a) and ( i / 1, a( i)) .] Å (k 0 1, a0rrrak01)

5. k õ i Å l / k

20 1: and (( i / 1) mod l , a( i)) Å (0, a0rrral02aV l01) onto

Again, the distance between the two image nodes (d( i , (0, a0rrrak01) .akrrral01) , a0rrrak01) and (d( i / 1, akrrr

There is a cycle-edge between these two image nodesai01aV iai/1rrral01) , a0rrrak01) is at most 1 (cf. claimin BFN(k) . jbelow).

(Cf. 4.)Claim. d(( l / k) /2 , ab) 0 d(( l / k) /2 0 1,

6.l / k

2° i õ l 0 1: ag) ° 1 for all a √ {0, 1}, b, g √ {0, 1} l0k01 .

Let g Å ck/1ck/2rrrcl01 , d :Å cl01cl02rrrck/1 √ {0,If l Å 2k , then f ( i , a0a1rrral01) Å ( i mod k , 1} l0k01 . Then,a0rrrak01) , and the two vertices are mapped ontoadjacent nodes of the same cycle.Let l ú 2k . ( i , a) is mapped onto dS l / k

2, abD Å dU S l / k

2, ab) and

(d( i , akrrral01) , a0rrrak01)

Å (dU ( i , akak/1rrral01) , a0rrrak01) dS l / k

20 1, agD Å dU S l / k

20 1, adD .

and (( i / 1) mod l , a( i)) ontoNow, four cases are distinguished to prove the claim. Ineach case, (*) and l /k ¢ 2 are used:(d( i / 1, akrrrai01aV iai/1rrral01) , a0rrrak01)

Å (dU ( i / 1, akrrrai01aV iai/1rrral01) , a0rrrak01) . 1. If k even, l even,

The lexicographical distance between

dU S l / k

2, abD Å k

2,

( i , akak/1rrral01) and

( i / 1, akak/1rrrai01aV iai/1rrral01)dU S l / k

20 1, adD Å k

20 1.

is at most



2. If k even, l odd, Note that the same embedding as in the proof of Theo-rem 2 for the CCC network does not work as well forthe BFN , because the cross-edges

a Å 0 : dU S l / k

2, 0bD Å k

20 1,

( i , a) 0 (( i / 1) mod l , a( i)) ,

i √ {p(0) , p(1) , . . . , p(k 0 1)}k

20 2 ° dU S l / k

20 1, 0dD ° k

20 1,

might be stretched to length 2. For dilation 1, the problemis that in the case of the BFN , because of these cross-edges, not only the nodes ( i , a) , i Å p( j) , 0 ° j ° k0 1 have to be mapped to level j of a cycle in BFN(k) ,a Å 1 : dU S l / k

2, 1bD Å k

2,

but also the nodes (( i / 1) mod l , a) have to be mappedto level ( j / 1) mod k .

So, once 0 ° p(0) õ p(1) õ rrr õ p(k 0 1) ° ldU S l / k

20 1, 1dD Å k

20 1. 0 1 are chosen for the second construction of Section

3.1, the distribution d( i , b) is already determined for all

i √ {p(0) , p(0) / 1, p(1) , p(1) / 1, . . . , p(k 0 1),3. If k odd, l even,

(p(k 0 1) / 1) mod l }.dU S l / k

2, abD Å k

2,

To achieve a low load, the values of p(0) , p(1) , . . . ,p(k 0 1) must be spread as evenly as possible among0, 1, . . . , l 0 1. The best possible load is obtained byk

20 1 ° dU S l / k

20 1, adD ° k

2. specifying

p(0) Å 0, p(k 0 1) Å l 0 1,4. If k odd, l odd,

p( i) 0 p( i 0 1) ° 2 for all 0 õ i ° k 0 1,

a Å 0 : dU S l / k

2, 0bD Å k

2, (p(0) , p(1) , . . . , p(k 0 1) exist because l /k õ 2.)

This way, the distribution d( i , b) in the second con-struction of Section 3.1 is defined completely as

k

20 1 ° dU S l / k

20 1, 0dD ° k

2,

d(p( j) , b) Å j ,

d((p( j) / 1) mod l , b)a Å 1 :

k

2° dU S l / k

2, 1bD ° k

2/ 1, Å ( j / 1) mod k for all 0 ° j ° k 0 1.

Now, the embedding f of BFN( l) into BFN(k) is definedas in the second construction of Section 3.1. Let p

V(0) ,dU S l / k

20 1, 1dD Å k

2. j

pV(1) , . . . , p

V( l 0 k 0 1) √ {0, 1, . . . , l 0 1} " {p(0) ,

p(1) , . . . , p(k 0 1)} such that pV(0) õ p

V(1) õ rrr

õ pV( l 0 k 0 1). Then,

(B) 1 õ l

kõ 2: f ( i , a0a1rrral01)


V(l0k01) ) , ap(0)rrrap(k01) ) .

The embedding f of BFN( l) into BFN(k) is describedin two stages: The dilation of f is 1, and the nodes ( i , b) with ( i Å 0)

or ( i ú 0 Ú p( i) 0 p( i 0 1) Å 1) have load 2 l0k andFirst Stage: BFN( l) is embedded into BFN(k) withdilation 1 and load 2r2 l0k by specifying d and p for the those with ( i ú 0 Ú p( i) 0 p( i 0 1) Å 2) have load

2r2 l0k . (The proof of these claims is analogous to theembedding f in the second construction of Section 3.1 asdescribed below. proof of Theorem 2.)



Fig. 3. Segment A.

Second Stage: The nodes are locally rearranged be- In the situation of Figure 3, the nodes on two groupsof cycles {e1 , e2 , e3 , e4}, {l1 , l2 , l3 , l4} (where e, l,tween different cycles in order to improve the load of the

embedding. e x l stand for a, b, g, or d) can be moved as shownin Figure 4. [An arrow indicates that a prior vertex ofFor this purpose, let us call a node £Å ( i , a) in BFN( l)

an A-node if i √ {p(0) , p(1) , . . . , p(k 0 1)}, that is, BFN( l) is transferred from one node in BFN(k) to an-other.]the cycle-edge

By applying Rearrangement A once or twice in thesituation of Figure 3, the following results are obtained( i , a) 0 (( i / 1) mod l , a)for each cycle Gi in BFN(k) :

is mapped to a corresponding cycle-edge in BFN(k) . Oth-• By rearrangements on 4 (2) prior cycles of BFN( l) , 2erwise, £ is called a B-node, and the cycle-edge

( i , a) 0 (( i / 1) mod l , a)

is mapped to a single node in BFN(k) .By choosing p(0) , p(1) , . . . , p(k 0 1) appropriately

in the first stage, one tries to partition the cycles ofBFN(k) into certain segments in which the load can berearranged between linked cycles.

Segment A

One type of such segment where a rearrangement is possi-ble is a sequence of nodes BAABA as displayed in Figure3. Prior nodes of BFN( l) are indicated by black dots andsmall letters, and image nodes in BFN(k) , by capitals.Prior edges of BFN( l) are illustrated by lines, and edgesin BFN(k) are equivalent to four parallel lines. Formally,we have

f ( i1 , ej) Å f ( i2 , ej) Å (I1 , Gj) ,

f ( i3 , ej) Å (I2 , Gj) ,

f ( i4 , ej) Å f ( i5 , ej) Å (I3 , Gj) ,

where f is the mapping after the first stage, and e is a Fig. 4. Rearrangement A, one element is transferred from I1to I2 and from I3 to I2 .synonym for a, b, g, or d.



Fig. 7. Rearrangement B2, one element is transferred fromI1 to I2 and from I2 to I3 .

. . . , p(k 0 1) in the first stage lead to the desired loadFig. 5. Segment B.of the final embedding. From now on,

(1) elements of the upper and the lower, overloadednodes I1 , I3 in BFN(k) are transferred to the middlenode I2 .

• Dilation 1 is kept up if the elements i1 and i5 on thecycles ai , bi , gi , di , 1 ° i ° 4, stay where they are.

will always be associated with a node in BFN(k) of loadThis means that on these cycles no movements are2r2 l0k , which is equivalent to a node sequence BA inallowed directly above or underneath the shown sec-BFN( l) .tion. But all the other cycles are not affected.

Segment B

The second type of segment that we consider is a sequence will be associated with a node in BFN(k) of load 2 l0k ,of nodes BABA as displayed in Figure 5. (For an explana- which is equivalent to an A-node in BFN( l) .tion of the figures below, confer the descriptions for Seg- We distinguish different cases:ment A.) Four kinds of moves can be constructed simi-larly to above (see Figs. 6–9), each affecting two priorcycles of BFN( l) . Any two of them may also be com-bined. (B1) :#

53

lk

Again, dilation 1 is kept up if the elements i1 and i4I. l 0 k even:on the cycles ai , bi , gi , di , 1 ° i ° 4, stay where they

are. This means that on these cycles no movements areallowed directly above or underneath the shown section.

Let l 0 k Å 2i / 2, i ¢ 0. By the right choice ofBut all the other cycles are not affected.p(0) , . . . , p(k 0 1) in the first stage, each cycle inA combination of the rearrangement techniques out-BFN( l) can be subdivided into a node sequencelined above and an appropriate choice of p(0) , p(1) ,(ABAAB) i/1A*. The load is rearranged as follows:

Fig. 8. Rearrangement B3, one element is transferred fromFig. 6. Rearrangement B1, one element is transferred fromI2 to I1 and from I1 to I0 . I2 to I1 .



Fig. 9. Rearrangement B4, one element is transferred fromI1 to I2 .

LetAn arrow and a number next to it indicate how manynodes are moved from one place to another. In each sec-tion of the cycle in BFN(k) , by using Rearrangement A

n Å t1 / t2 / 3 Å p

2, 1

32l0k times [thus affecting at most 2r 1

32l0k / 2

° 2 l0k cycles of BFN( l)] , a load of 532

l0k is achieved.

II. l 0 k odd:m Å 2t3 / 2t4 / 5 Å Hp if p odd,

p 0 1 if p evenWith similar techniques as in Case I, a load of 532

l0kcan be derived. A detailed discussion of the differentsubcases can be found in [11]. In every case, it must be be the lengths of the corresponding sections on the hostmade sure that at most 2 l0k cycles of BFN( l) are affected cycle in BFN(k) . It is shown below that the load can beby the rearrangements anywhere in the cycle in BFN(k) . distributed optimally in each of these sections, thus yield-

ing a load of

(B2)2p0 4p0 1

õ l

k° 2p0 2

pfor p√ {7, 8,rrr}:

• [2n 0 1)/n]2 l0k in sections of Type 1,• [2m 0 2)/m]2 l0k in sections of Type 2.By choosing p( i) :Å il /k for 0° i° k0 1 in the first

stage, each cycle in BFN( l) is partitioned into sections ofTherefore, the whole embedding has load at most [(2pthe following two types:0 2)/p] 2 l0k .

Load Balancing in Sections of Type 1

In sections of Type 1, the load can be distributed evenlyby shifting it from the overloaded nodes on the outsideto the underloaded nodes in the middle. This is done byapplying Rearrangements B1, . . . , B4 in the

parts and by using Rearrangement A in the

whereparts. Again, it must be ensured that at most 2 l0k cyclesof BFN( l) are affected by the rearrangements anywhere

t1 Åp 0 4

4, t2 Å

p 0 64

, in the cycle in BFN(k) .It turns out that the cases n √ {4r / 3, 4r / 4, 4r

t3 Åp 0 5

4, t4 Å

p 0 34

. / 5, 4r / 6}, for r √ N0 , have to be distinguished. Here,we only state the case n Å 4r / 3. All the other cases



work in a similar way (cf. [11]) . In each case, the loadderived is [(2n 0 1)/n]2 l0k .

Let n Å 4r / 3, and let (1/n)2 l0k be abbreviatedby L . Then, the load in each section can be balanced asfollows:

As (2 (r / 12)L / 2) / (2(r / 1)L / 2) ° 2 l0k (as l

0 k ¢ 5), at most 2 l0k cycles are affected.The load derived is 2r2 l0k 0 L Å [(2m 0 2)/

m]2 l0k .

4. EXPERIMENTAL RESULTS

To validate the theoretical results, some experiments wereIf r Å 0 (i.e., l Å 5, k Å 3), at most 2L / 2 Å 4 ° 2 l0k

carried out on a system of T800 Transputers. These proc-cycles are affected by the rearrangements. If r ¢ 1 (i.e.,essors provide four communication links that can be usedl 0 k ¢ 6), at most (2rL / 2) / (2(r / 1)L / 2)to connect a number of processors to build up a parallel° 2 l0k cycles are affected.computer system. Using these communication links, arbi-The load derived is 2r2 l0k0 L Å [(2n 0 1)/n]2 l0k .trary network structures can be implemented. Thus, weused the Transputers to build up a parallel computer sys-Load Balancing in Sections of Type 2 tem having a CCC network architecture. To program adistributed system like that, one has to specify a so-calledIn sections of Type 2, the load can be distributed evenly

by shifting it in the same manner in the upper and the configuration map of the network topology. This mapdescribes the mapping of the process/ task graph (logicallower halves of each section, namely, by shifting it from

the overloaded nodes on the outside of each half to the structure) of a parallel program onto the processor net-work (physical structure) . More precisely, the configura-underloaded nodes in the middle. The way this is done

has already been explained for sections of Type 1. tion map specifies the process–processor mapping, thelogical communication channels (between the processes) ,It turns out that the cases m √ {8r / 7, 8r / 9, 8r

/ 11, 8r / 13}, for r √ N0 , have to be distinguished. and the mapping of the logical communication channels(between the processes) to the physical communicationHere, we only state the case m Å 8r / 7. All the other

cases work in a similar way (cf. [11]) . In each case, the channels ( the link connections between the processors) .For our experiments, we used a CCC as the physicalload derived is [(2m 0 2)/m]2 l0k .

Let m Å 8r / 7, and let (2/m)2 l0k be abbreviated network structure as well as the logical network structureof the parallel program. The task of the compression toolby L . Then, the load in each section can be balanced as

follows: that implements the methods of the previous sections is



TABLE I. Compressing an arbitrary distributed algorithm

y x CCC (2) CCC (2 r 1) CCC (3) CCC (3 r 2)

10,000,000 1 280.19 1160.27 4.141 855.36 2566.55 3.0001,000,000 10 280.43 1161.22 4.140 857.00 2573.24 3.002

100,000 100 282.83 1172.08 4.144 873.44 2629.53 3.01010,000 1000 307.00 1276.25 4.157 1030.26 3189.78 3.096

1000 1000 57.64 233.31 4.047 335.86 900.26 2.680100 10,000 439.48 1333.64 3.034 3189.14 8341.35 2.61510 10,000 433.57 1268.10 2.924 3182.23 8339.98 2.6201 100,000 4328.18 12641.41 2.920 31850.16 83329.14 2.616

to map the larger CCC structure of the parallel program connected cycles (CCC) and the butterfly network(BFN) , two classes of networks which are of major im-to the smaller CCC structure of the parallel machine. Our

compression tool takes any configuration map for a logi- portance for practical applications. It is demonstrated thatthe problem can be solved very efficiently for these net-cal CCC network of dimension l and builds a configura-

tion map for a physical CCC network of any dimension works, that is, compared to the inherent slowdown, thesimulation only takes a little amount of additional runningk ° l by applying the algorithms of Section 3.

Since the edge congestion and dilation are greater than time (due to a communication overhead or to an imbal-anced distribution of the workload).one in some cases, we have to integrate multiplexing and

routing processes in the program for the target CCC . This For an analysis, the network simulation problem ismodeled as a graph embedding problem, transferring theis done automatically by our tool. The user only has to

specify the configuration map of his logical CCC and to desired properties of the simulation into properties of thecorresponding embedding (dilation, load, edge conges-insert the dimension of his physical CCC . So, a number

of user processes are mapped to one processor and run tion). A number of embeddings are presented with smallin parallel to routing and multiplexing processes. By this dilation and/or small load, leading to the main simulationcompression, many adjacent processes are mapped to the result above. In addition to these properties, we show thatsame processor which results in a faster communication our techniques behave very well for real applications.between them. Therefore, one might say as a summary that many

To measure the overhead which is caused by the addi- fundamental applications for large CCCs and BFNs cantional routing and multiplexing processes, we first wrote be implemented very efficiently on a network of realisticdifferent user programs Pl consisting of lr2 l processes size by using the results of this paper, thus underliningconfigured as a CCC of dimension l . Every program per- the fact that the CCC and the BFN are very powerfulforms x iterations each one consisting of y internal dummy interconnection structures for computer architectures inoperations and one communication with another process parallel processing.of the logical CCC . We executed program P3 on a physi- An interesting question to investigate is whether thecal CCC(3) and the compressed version of P3 on a physi- nonoptimal dilation 1 embeddings of large CCCs andcal CCC(2) . We also ran P2 on a physical CCC(2) and BFNs (of dimension l) into smaller ones (of dimensionthe compressed version of P2 on a physical CCC(1) . k) can still be improved with respect to their load or

Table I shows the execution times in seconds and the whether the embeddings presented can be shown to beresulting overhead for different x and y . One can see that optimal. In some special cases, for example, ( l , k) Å (8,the overhead is even better than the optimal load factor 5) for the CCC and ( l , k) Å (3, 2) , ( l , k) Å (6, 4) for thefor the corresponding embedding [i.e., 4 when embedding BFN , it is possible to achieve optimum load by applyingCCC(2) into CCC(1) and 3 when embedding CCC(3) similar techniques as described in this paper (see [11]) .into CCC(2)] , if there is a large amount of communica- We are also investigating a new idea which might leadtion between the processes. This is because communica- to an improvement in many cases when embeddingtion between processes on the same processor is approxi- CCC( l) into CCC(k) , l /k õ 2, for example, for ( l , k)mately 2.5 times as fast as communication between linked Å (11, 7) . But especially for the BFN , even for simpleprocessors. cases such as the embedding of BFN(4) (with 64 proces-

sors) into BFN(3) (with 24 processors) , it is not clearwhether the load can be improved any further. Finally, a5. CONCLUSIONfurther study should also consider the edge congestion ofthe embeddings and try to minimize it while keeping upIn this paper, new solutions to the problem of simulating

large networks on smaller ones are proposed for the cube- the same dilation and load.



machines into small ones. Proceedings of the 5th MITThe authors would like to thank the anonymous referees forConference on Advanced Research in VLSI (1988) 179–their comments which helped to improve the presentation of198.the paper. This work was partially supported by Grants Mo 285/

[ 11] R. Klasing, Simulating large cube-connected cycles and9-1 and Me 872/6-1 (Leibniz Award) of the German Researchlarge butterfly networks on smaller ones. Master Thesis,Association (DFG), and by the ESPRIT Basic Research ActionUniversitat-GH Paderborn, Fachbereich 17-Mathematik/No. 7141 (ALCOM II) and No. 20244 (ALCOM IT).Informatik, Germany (1990).

[ 12] F. T. Leighton, Introduction to Parallel Algorithms andArchitectures: Arrays, trees, hypercubes. Morgan Kauf-

REFERENCES mann Publishers, San Mateo, California (1992).[ 13] R. Luling and B. Monien, Two strategies for solving

the vertex cover problem on a transputer network. 3rd[ 1] M. J. Atallah and S. R. Kosaraju, Optimal simulationsInternational Workshop on Distributed Algorithms,between mesh-connected arrays of processors. J. ACMLNCS 392 (1989) 160–170.35 (1988) 635–650.

[ 14] B. Monien, Simulating binary trees on X-trees. Proceed-[ 2] F. Berman and L. Snyder, On mapping parallel algo-ings of the 3rd ACM Symposium on Parallel Algorithmsrithms into parallel architectures. J. Parallel Distrib.and Architectures (SPAA 91) (1991) 147–158.Comput. 4 (1987) 439–458.

[ 15] B. Monien and I. H. Sudborough, Embedding one inter-[ 3] S. N. Bhatt, F. R. K. Chung, J.-W. Hong, F. T. Leighton,

connection network in another. Computing (Suppl. 7)and A. L. Rosenberg, Optimal simulations by butterfly

(1990) 257–282.networks. Proceedings of the 20th ACM Symposium on

[ 16] P. A. Nelson and L. Snyder, Programming solutions tothe Theory of Computing (1988) 192–204.the algorithm contraction problem. Proceedings of the

[ 4] H. L. Bodlaender, The classification of coverings of 1986 International Conference on Parallel Processing,processor networks. J. Parallel Distrib. Comput. 6 258–261.(1989) 166–182.

[ 17] R. Peine, Cayley-Graphen und Netzwerke. Master The-[ 5] H. L. Bodlaender and J. van Leeuwen, Simulation of sis, Universitat-GH Paderborn, Fachbereich 17 - Ma-

large networks on smaller networks. Infor. Control 71 thematik/Informatik, Germany (1990).(1986) 143–180.

[ 18] F. P. Preparata and J. E. Vuillemin, The cube-connected[ 6] J. Ellis, Z. Miller, and I. H. Sudborough, Compressing cycles: A versatile network for parallel computation.

meshes into small hypercubes. Technical Report, Com- Commun. ACM 24 (1981) 300–309.puter Science Program, University of Texas at Dallas, [ 19] A. G. Ranade, How to emulate shared memory. J. Com-Richardson, TX 75083-0688. put. Syst. Sci. 42 (1991) 307–326.

[ 7] M. R. Fellows, Encoding graphs in graphs. PhD Disser- [ 20] A. L. Rosenberg, Graph embeddings 1988: Recenttation, University of California at San Diego (1985). breakthroughs, new directions. Proceedings of the 3rd

[ 8] R. Feldmann and W. Unger, The cube-connected-cycle Aegean Workshop on Computing (AWOC): VLSI Algo-is a subgraph of the butterfly network. Parallel Process. rithms and Architectures (LNCS 319) (1988) 160–169.Lett. 2 (1992) 13–19. [ 21] J. T. Schwartz, Ultracomputers. ACM Trans. Program.

[ 9] J. P. Fishburn and R. A. Finkel, Quotient networks. Lang. Syst. 2 (1980) 484–521.IEEE Trans. Comput. C-31 (1982) 288–295. [ 22] H. S. Stone, Parallel processing with the perfect shuffle.

IEEE Trans. Comput. C-20 (1971) 153–161.[ 10] A. K. Gupta and S. E. Hambrusch, Embedding large tree


compressing cube-connected cycles and butterfly networks

Documents