fast algorithms for finding o(congestion+dilation) packet routing

Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995

Fast Algorithms for Finding O(Congestion+Dilation) Packet Routing Schedules

Tom Leighton’ Laboratory for Computer Science, and

Mathematics Department Massachusetts Institute of Technology

Cambridge, MA 02139 [email protected]

Abstract

In 1988, Leighton, Maggs, and Rao showed that for any network and any set of packets whose paths through the network are fixed and edge-simple, there exists a schedule for routing the packets to their destinations in O(c + d) steps using constant-size queues, where c is the congestion of the paths in the network, and d is the length of the longest path. The proof, however, used the Lovdsz Local Lemma and was not constructive. In this paper, we show how to find such a schedule in O(NE + E 1og’E) time, for any fixed 6 > 0, where N is the total number of packets, and E is the number of edges in the network. We also show how to parallelize the algorithm so that it runs in NC. The method that we use to construct eficient packet routing schedules is based on the algorithmic form of the Lovdsz Local Lemma discovered by Beck.

1 Introduction

The problem of efficiently routing packets of data through a network is central to the design and use of large-scale parallel and distributed systems. A myriad of schemes have been developed to solve packet routing problems in a wide variety of contexts, ranging in scope from the randomized routing algorithm used on the fat-tree network in a Connection Machine CM-5 to the routing protocols being developed for the proposed National Information Infrastructure (NII) electronic superhighway. (For a survey of some of the approaches

‘Tom Leighton is supported in pert by Air Force Contract AFOSR F49620-92-50125 and ARPA Contracts NOOO14-91-J- 1698, N00014-92-J-1799, and F33615-931-1330.

2Bruce Maggs is supported in part by an NSF National Young Investigator Award and by ARPA Contracts F33615-93- l-1330, N00014-91-J-1698, and NOOO14-92-J-1799.

Bruce Maggs2 Computer Science Department

School of Computer Science Carnegie Mellon University

Pittsburgh, PA 15213 [email protected]

to the problem in the High Performance Computing (HPC) domain, we refer the reader to [8, 111.)

1.1 The scheduling problem

In this paper, we consider the problem of scheduling the movements of packets whose paths through a network have already been determined. The problem is formalized as follows. We are given a network with V nodes (switches) and E edges (channels). Each node can serve as the source or destination of an arbitrary number of messages. Each message consists of an arbitrary number of packets (or cells or flits, as they are sometimes referred to). Let N denote the total number of packets to be routed. (In a dynamic setting, N would denote the rate at which packets enter the network. For simplicity, we will consider a static scenario in which a total of N packets are to be routed through the network.) The goal is to route the N packets from their origins to their destinations via a series of syn- chronized time steps, where at each step at most one packet can traverse each edge.

Figure 1 shows a 5-node network in which one packet is to be routed to each node. The shaded nodes in the figure represent switches, and the edges between the nodes represent channels. A packet is depicted as a square box containing the label of its destination.

During the routing, packets wait in three different kinds of queues. Before the routing begins, packets are stored at their origins in special initial queues. When a packet traverses an edge, it enters the edge queue at the end of that edge. A packet can traverse an edge only if at the beginning of the step, the edge queue at the end of that edge is not full. Upon traversing the last edge on its path, a packet is removed from the edge queue and placed in a special final queue at its destination. In Figure 1, all of the packets reside

1060-3425/95 $4.00 0 1995 IEEE 555

Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE


Figure 1: A graph model for packet routing.

in initial queues. For example, packets 4 and 5 are stored in the initial queue at node 1. In this example, each edge queue is empty, but has the capacity to hold two packets. Final queues are not shown in the figure. Independent of the routing algorithm used, the size of the initial and final queues are determined by the particular packet routing problem to be solved. Thus, any bound on the maximum queue size required by a routing algorithm refers only to the edge queues.

Figure 2: A set of paths for the packets with dilation d = 3 and congestion c = 3.

1.2 Previous and related work

This paper focuses on the problem of timing the movements of the packets along their paths. A schedule for a set of packets specifies which move and which wait at each time step. Given any underlying network, and any selection of paths for the packets, our goal is to produce a schedule for the packets that minimizes the total time and the maximum queue size needed to route all of the packets to their destinations. We would also like to ensure that any two packets traveling along the same path to the same destination always proceed in order.

Given any set of paths with congestion c and dilation din any network, it is straightforward to route all of the packets to their destinations in cd steps using queues of size c at each edge. Since the queues are big enough so that packets can never be delayed by a full queue in front, each packet can be delayed at most c - 1 steps at each of at most d edges on the way to its destination

Of course, there is a strong correlation between the time required to route the packets and the selection of the paths. In particular, the maximum distance, d, traveled by any packet is always a lower bound on the time. We call this distance the dilaiion of the paths. Similarly, the largest number of packets that must traverse a single edge during the entire course of the routing is a lower bound. We call this number the congestion, c, of the paths. Figure 2 shows a set of paths for the packets of Figure 1 with dilation 3 and

In [lo], Leighton, Maggs, and Rao showed that there are much better schedules. In particular, they established the existence of a schedule using O(c + d) steps and constant-size queues at every edge, thereby achieving the naive lower bounds for any routing problem. The result is highly robust in the sense that it works for any set of edge-simple paths and any underlying network. (A priori, it would be easy to imagine that there might be some set of paths on some network that required more than R(c+d) steps or greater than constant size queues to route all the packets.) The method that they used to show the existence of optimal schedules, however, is not constructive. In other words, the best known algorithms for producing schedules require time that is exponential in the number of packets.

congestion 3.

For the class of leveled networks, Leighton, Maggs, Ranade, and Rao [9] h s owed that there is a simple on- line randomized algorithm for routing the packets to their destinations within O(c + L + log N) steps, with high probability, where L is the number of levels in the

556



network, and N is the total number of packets. (In a leveled network with L levels, each node is labeled with a level number between 1 and L, and every edge that has its tail on level i has its head on level i + 1, for 1 5 i < L.)

Mansour and Patt-Shamir [12] then showed that if packets are routed greedily on shortest paths, then all of the packets reach their destinations within d + N steps, where N is the total number of packets. These schedules may be much longer than optimal, however, because N may be much larger than c.

Recently Meyer auf der Heide and Vacking [13] devised a simple on-line randomized algorithm that routes all packets to their destinations in O(c + d + log N) steps, with high probability, provided that the paths taken by the packets are short-cut free (e.g., shortest paths).

1.3 Our results

In this paper, we show how to produce schedules of length O(c + d) in O(NE + E log’ E) time, for any constant 6. The schedules can also be found in polylogarithmic time on a parallel computer using 0( NE + E log’ E) work.

The algorithm for producing the schedules is based on an algorithmic form of the LovLz Local Lemma [S], [14, pp. 57-581 d iscovered by Beck [3]. Showing how to modify Beck’s arguments so that they can be applied to load balancing problems in networks is the main contribution of the paper. Once this is done, the con- struction of optimal routing schedules is accomplished using the methods of [lo].

the 1 (or fewer) nodes mapped to it. This takes O(1) time. Then for each packet sent along an edge of G, H sends a packet along the corresponding path in the embedding. The algorithm described in this paper can be used to produce a schedule in which the packets are routed to their destinations in O(c + d) steps. Thus, H can emulate each step of G in O(1 + c + d) steps.

The result also has applications to job-shop scheduling. In particular, consider a scheduling problem with jobs il,. . . , j,, and machines ml, . . . , m,, for which each job must be performed on a specified sequence of machines. In this application, we assume that each job occupies each machine that works on it for a unit of time, and that no machine has to work on any job more than once. Of course, the jobs correspond to packets, and the machines correspond to edges in the packet routing problem. Hence, we can define the dilahon of the scheduling problem to be the maximum number of machines that must work on any job, and the congeshon to be the maximum number of jobs that have to be run on any machine. As a conse- quence of the packet routing result, we know that any scheduling problem can be solved in O(c+ d) steps. In addition, we know that there is a schedule for which each job waits at most O(c + d) steps before it starts running, and that each job waits at most a constant number of steps in between consecutive machines. The queue of jobs waiting for any machine will also always be at most a constant.

1.4 Outline

The result has several applications. For example, if a particular routing problem is to be performed many times over, then it may be feasible to compute the optimal schedule once using global control. This situ- ation arises in network emulation problems. Typically, a guest network G is emulated by a host network H by embedding G into H. (For a more complete discus- sion of emulations and embeddings, see [7].) An embedding maps nodes of G to nodes of H, and edges of G to paths in H. There are three important measures of an embedding: the load, congestion, and dilation. The load of an embedding is the maximum number of nodes of G that are mapped to any one node of H. The congeslion is the maximum number of paths corresponding to edges of G that use any one edge of H. The dilation is the length of the longest path. Let 1, c, and d denote the load, congestion, and dilation of the embedding. Once G has been embedded in H, H can emulate G in a step-by-step fashion. Each node of H first emulates the local computations performed by

The remainder of the paper is divided into sections as follows. In Section 2, we briefly summarize the nonconstructive method for producing routing schedules described in [lo]. In Section 3, we describe how to make the method constructive. In Section 4, we show how to parallelize the scheduling algorithm. We con- clude with some remarks in Section 5.

2 A brief review of the nonconstructive method

In [lo], Leighton, Maggs, and Rao proved that for any set of packets whose paths are edge-simple’ and have congestion c and dilation d, there is a schedule of length O(c + d) in which at most one packet traverses each edge of the network at each step, and at most a constant number of packets wait in each queue at each step. Note that there are no restrictions on the size,

1 An edge-rimple path uses no edge nmre than once.

557



topology, or degree of the network or on the number of packets.

Instead of describing the O(c + d) result, we will show that there is a schedule of length (c + d)2°@‘g*(c+d)) that uses queues of size log(c + @0(‘=d(e+d)). This preliminary result is substan- tially simpler to prove because of the relaxed bounds on the schedule length and queue size. Nevertheless, it illustrates the basic ideas necessary to prove the O(c + d) result.

The strategy for constructing an efficient schedule is to make a succession of refinements to the “greedy” schedule, Sc, in which each packet moves at every step until it reaches its final destination. This initial schedule is as short as possible; its length is only d. Unfor- tunately, as many as c packets may have to use an edge at a single time step in Se, whereas in the final schedule at most one packet is allowed to use an edge at each step. Each refinement will bring us closer to meeting this requirement by bounding the congestion within smaller and smaller frames of time.

The proof uses the LovLz Local Lemma [6], [14, pp. 57-581 at each refinement step. Given a set of “bad” events in a probability space, the lemma pro- vides a simple inequality which, when satisfied, guar- antees that with probability greater than zero, no bad event occurs. The inequality relates the probability that each bad event occurs with the dependence among them. A set of events Al,. . . , A,,, in a probability space has dependence at most b if every event is mutually independent of some set of m - b - 1 other bad events. The lemma is nonconstructive; for a dis- crete probability space, it shows only that there exists some elementary outcome that is not in any bad event.

Lemma 2.1 (Loviisz) Let Al,. . . , A,,, be a set of “bad” events each occurring with probability p with de-

pendence al most b. If 4pb < 1, then with probability greater than zero, no bad event occurs. cl

Before proceeding, we need to introduce some no- tation. A T-frame is a sequence of T consecutive time steps. The fmme congestion, C, in a T-frame is the largest number of packets that traverse any edge in the frame. The relative congestion, R, in a T-frame is the ratio C/T of the congestion in the frame to the size of the frame.

Lemma 2.2 [1U] F or any set of packets whose paths are edge-simple and have congestion c and dilation d, them is a schedule of length O(c + d) in which packets never wait in edge queues and in which the relative congestion in any frame of size logd or greater is at most 1.

Proof: The proof uses the Lo&z Local Lemma. The first step is to assign an initial delay to each packet. Without loss of generality, we assume that c = d. The delays are chosen from the range [l, ad], where (Y is a fixed constant that will be determined later. In the resulting schedule, Si , a packet that is assigned a delay of z waits in its initial queue for z steps, then moves on to its destination without waiting again until it enters its final queue. The length of Sr is at most (l+o)d. We use the Lovbz Local Lemma to show that if the delays are chosen randomly, independently, and uniformly, then with nonzero probability the relative congestion in any frame of size log d or greater is at most 1. Thus, such a set of delays must exist.

To apply the Lovbz Local Lemma, we associate a bad event with each edge. The bad event for edge g is that more than T packets use g in some T-frame, for T 2 logd. To show that there is a way of choosing the delays so that no bad event occurs, we need to bound the dependence, b, among the bad events and the probability, p, of each individual bad event occurring.

The dependence calculation is straightforward. Whether or not a bad event occurs depends solely on the delays assigned to the packets that pass through the corresponding edge. Thus, two bad events are independent unless some packet passes through both of the corresponding edges. Since at most c packets pass through an edge, and each of these packets passes through at most d other edges, the dependence, b, of the bad events is at most cd = d’.

Computing the probability of each bad event is a little trickier. Let p be the probability of the bad event corresponding to edge g. Then

P< f: G+a)d T=log d

(C) (sJT*

This expression is derived as follows. Frames of size greater than d cannot have relative congestion greater than 1, since the total congestion is only d. Thus, we can ignore them. We bound the probability that any frame of size log d or greater has relative congestion greater than 1 by summing, over all frame sizes T from logd to d, the probability that some T-frame has relative congestion greater than 1. Furthermore, for any T, there are at most (1 + cx)d different T- frames and we bound the probability that any one of them has relative congestion greater than 1 by summing their individual probabilities. The number of packets passing through g in any T-frame has a binomial distribution. There are d independent Bernoulli trials, one for each packet that uses g. Since at most

558



time step (l+a)d

log d

Figure 3: Schedule S1. The schedule is derived from the greedy schedule, So, by assigning an initial delay in the range [l, adI to each packet. We use the LovLz Local Lemma to show that within each log d-frame, at most logd packets pass through each edge.

T of the possible ad delays will actually send a packet through g in the frame, each trial succeeds with probability T/ad. (Here we use the assumption that the paths are edge-simple.) The probability of more than T successes is at most #(T/ad)=.

For sufficiently large, but fixed, Q the product 4pb is less than 1, and thus, by the Lo&z Local Lemma, there is some assignment of delays such that the relative congestion in any frame of size logd or greater is at most 1. cl

Theorem 2.3 [lo] For any set of packets whose paths are edge-simple and have congestion c and dilalion d, there is a schedule having length (c + d)2°(‘“g’(c+d)) and maximum queue size log(c + d)2°(‘0g’(c+d)) in which at most one packet tnzverses each edge at each step.

Proof: For simplicity, we shall assume without loss of generality that c = d, so that the bounds on the length and queue size are d2O@fJ’ d, and (log d)2°(‘“g’ d), re- spectively.

The proof has the following outline. We begin by using Lemma 2.2 to produce a schedule S1 in which the number of packets that use an edge in any logd- frame is at most logd. Next we break the schedule into (1 + a)d/ log d log d-frames, as shown in Figure 3. Finally, we view each log d-frame as a routing problem with dilation logd and congestion log d, and solve it recursively.

Each log d-frame in Sl can be viewed as a separate scheduling problem where the origin of a packet is its location at the beginning of the frame, and its destination is its location at the end of the frame. If at most log d packets use each edge in a log d-frame, then the congestion of the problem is log d. The dilation is also logd because in logd time steps a packet can move a distance of at most logd. In order to schedule each

frame independently, a packet that arrives at its destination before the last step in the rescheduled frame is forced to wait there until the next frame begins.

All that remains is to bound the length of the schedule and the size of the queues. The recursion proceeds to a depth of O(log’ d) at which point the frames have constant size, and at most a constant number of packets use each edge in each frame. The resulting schedule can be converted to one in which at most one packet uses each edge in each time step by slowing it down by a constant factor. Since the length of the schedule increases by a constant factor during each recursive step, the length of the final schedule is d2’@‘g* d). The bound on the queue size follows from the observation that no packet waits at any one spot (other than its origin or destination) for more than (logd)2°(‘0~’ d, consecutive time steps, and in the final schedule at most one packet traverses each edge at each time step. cl

Proving that there is a schedule of length O(c + d) using constant-size queues is more difficult. Removing the 2°(‘“g’(c+d)) factor in the length of the schedule requires tighter probability calculations, and reducing the queue size to a constant mandates greater care in spreading delays out over the schedule. The details can be found in [lo].

3 An algorithm for constructing optimal schedules

In this section, we describe the key ideas required to make the nonconstructive proof of [lo] constructive. There are many details in that proof, but changes are required only where the LovLz Local Lemma is used, in Lemmas 3.7 and 3.9 of [lo]. These lemmas show that a schedule can be modified by assigning delays to the packets in such a way that in the new schedule the relative congestion can be bounded in much smaller frames than in the old schedule. In this paper, we show how to find the assignment of delays quickly. We will neither regurgitate the entire proof, nor reprove Lemmas 3.7 and 3.9, because doing so would require explaining so many details of the original proof that the new technique in this paper would be obscured. Instead we will prove just one representative theorem, Theorem 3.2, in the hope that a reader who under- stands the theorem and the proofs of Lemmas 3.7 and 3.9 of [lo] can easily see how to make the original proof constructive.

The following lemma is used in the proof of the theorem. It shows that if we can bound the relative



congestion in frames of size T to 2T - 1, then we can bound the relative congestion in all frames of size T or greater.

Lemma 3.1 [lo] In any schedule, if the number of packets thal use a particular edge g in any y-frame is al mosl Ry, for all y between T and 2T - 1, then the number of packets that use g in any y-frame is al most Ry for ally 1 T.

Proof: Consider a frame 7 of size T’, where T’ > 2T - 1. The first ([T’/TJ - l)T steps of the frame can be broken into T-frames. In each of these frames, at most RT packets use g. The remainder of the T/-frame 7 consists of a single y-frame, where T 5 y 5 2T - 1, in which at most Ry packets use g. q

3.1 A representative theorem

Theorem 3.2 Consider a schedule of length I2 in which Ihe relative congestion in any frame of size I or greater is al most r, where 1 5 r 5 I. Then there is some way of assigning initial delays in Ihe range [l, r] to the packets so that in the resulting schedule, the relative conges1ion in any frame of site log2 I or greater is at most r’, where r’ = r(1 + u) and u = 0(1)/m. Furthermore, with high probabil- iiy, ihis assignment can be found in O(NE+ E log’ E) lime, for any fixed c > 0.

Proof: The proof begins by using the Lovhz Local Lemma to show that such an assignment of initial delays exists.

With each edge and each time frame of size log’ I through (2 log2 Z) - 1, we associate a bad event. The bad event for an edge g and a particular T-frame 7 occurs when more than r’T packets use edge g during frame 7. If no bad event occurs, then by Lemma 3.1, the relative congestion in all frames of size log2 Z or greater will be at most r’. Since there are log2 Z different frame sizes and there at most Z2 + Z different frames of any particular size, the total number of bad events involving any one edge is at most (Z2 + Z) log2 Z < Z3 (for Z > 2). We show that if each packet is assigned a delay chosen randomly, independently, and uniformly from the range [l, I], then with non-zero probability no bad event occurs.

In order to apply the LovLz Local Lemma, we must bound both the dependence of the bad events, and the probability that any bad event occurs. The dependence calculation is straightforward. Whether or not a bad event for an edge g and a time frame r occurs depends solely on the assignment of delays to the

packets that pass through g. Thus, the bad event for an edge g and a time frame 7 and the bad event for an edge g’ and a time frame r’ are dependent only if g and g’ share a packet. Since at most rZ2 packets pass through g, and each of these packets pass through at most Z2 other edges g’, and there are at most Z3 time frames f, the dependence b is at most rZ7. For r 5 I, b< Z8.

The probability calculation is a little more involved. The number of packets that use an edge g during a particular T-frame 7 has a binomial distribution. In the new schedule, a packet can use g during T only if in the original schedule it used g during r or during one of the I steps before the start of r. Since the relative congestion in any frame of size I or greater in the original schedule is at most r, there are at most r(Z + T) such packets. The probability p that an individual packet that could use g during 7 actually does so is at most T/Z. Th us, the probability that s or more packets use an edge g during a particular T frame 7 is at most

To estimate the area under the tails of this binomial distribution, we use the following Chernoff-type bound [5]. Suppose that there are n independent Bernoulli trials, each of which is successful with probability q. Let S be denote the number of successes in the n trials, and let p = E[q = nq. Following Angluin and Valiant [2], we have

Pr[S 2 (1 + 7)~] 5 e-Tap/3

forO<751. In our application, n = r(l + T), q = T/Z, and

~1 = r(Z + T)T/Z. For 7 = l/J3ko/ log I, r 2 1, and T 2 log2 I, we have Pr[S 1 (1 + 7)/.4] 5 e-Lo’Ogr 5 e-‘olnl. By making the constant Lo large enough, we can ensure that Pr[S 2 (1 + 7)~] 5 l/Z” for any constant k2 > 0. Setting r’T = (1 + 7)~ = (1 + l/dv)r(Z + T)T/Z, we have r’ 5 r(1 + kl/m, for some constant El (that depends on ko). Let 0 = kl/m. Then r’ 5 r(l + u). Setting p = Pr[S 2 (1 + 7)/.~] = Pr[S 2 r’T], and k2 > 10, we have 4pb < 1. Hence, by the Lo&z Local Lemma, there is some assignment of delays for which no bad event occurs.

We now describe the algorithm for finding the assignment. We process the packets one at a time. For each packet, we assign it a delay chosen randomly, uniformly, and independently from 1 to I. We then

560



examine every event in which the packet participates. We say that the event for an edge g and a T-frame r is critical if delays have been assigned to C packets that could possibly use g in T, and, of these, more than CT/I + r(l + a)(Z + T)T/Z packets use g during 7. If a packet causes an event to become critical, then we set aside all of the other packets that could also use g during r, but whose delays have not yet been assigned. We will deal with the packets that have been set aside later. As we shall see, after one pass of assigning random delays to the packets, the problem is broken into a collection of much smaller subproblems, with high probability.

In order to proceed, we must introduce some nota- tion. The dependence graph, G, is the graph in which there is a node for each bad event, and an edge between two nodes if the corresponding events share a packet. The degree of G is b, which, as we argued before, is at most Z8. We say that a node in G is critical if the corresponding event is critical. We say that a node is endangered if its event shares a packet with an event that is critical. Let G3 be the cube of graph G, i.e., in G3 there is an edge between two distinct nodes u and v if in G there is a path of length at most 3 between u and v. The degree of G3 is at most b3.

Our goal now is to show that the number of critical nodes in any connected component consisting of only critical and endangered nodes in G is at most log E, with high probability. Then, since the events in different components do not share any packets, we can view each component as a scheduling problem to be solved in isolation.

The trick to bounding the number of critical nodes in any component is to associate a distinct tree TV in G3 with each maximal connected subgraph U of G consisting only of critical and endangered nodes. In G3, the critical nodes of U form a connected subgraph because any path u, ei, ez, es, v that connects two critical or endangered nodes u and v by passing through three consecutive endangered nodes ei, ez, es can be replaced by two paths u, ei, ez, cz and cz, ez,es, v of length three that each pass through ez’s critical neighbor ~2. Let TV be any spanning tree (in G3) of the critical nodes in U. Note that two different sets U and U’ must have different trees TV and TUJ in G3 because they have different sets of critical nodes.

Now let us enumerate the different trees of size t in G3. To begin, a node is chosen as the root. There are at most 2E possible roots. Next, we construct the tree as we perform a depth-first traversal of it. Nodes of the tree are visited one at a time. At each node u in the tree, either a previously unvisited neighbor

of u is chosen as the next node to visit (and add to the tree), or the parent of u is chosen to visit (at the root, the only option is to visit a previously unvisited neighbor). Thus, at each node there are at most b3 ways to choose the next node. Since each edge in the tree is traversed once in each direction, and there are t - 1 edges, the total number of different trees with any one root is at most (b3)2’-1 < b6t.

We can bound the probability that all of the nodes in any particular tree TV are critical ss follows. First form a maximum set SW of nodes in TV that are independent in G3. Let t denote the size of Tu. Then S~J contains at least t/2 critical nodes, because any tree of size t contains an independent set of t/2 nodes. Using the same type of Chernoff-bound calculation that was used to bound p, we can show that for r’ = r(l + u) = r(l + Cl/m, the probability that any particular event becomes critical is at most l/Z’s for any k3 > 0, by making ki large enough. Since the nodes in SW are independent, the corresponding events are also independent. Hence the probability that all of the nodes in TV are critical is at most 1/ZkLs’i2, and the probability that all of the nodes in any of the 2E(b6’ trees of size t is at most

) 2E(b6t)/Zkat/2 = 2E(Z-tL3 2-48)t) (since b 5 I*). For i = log E and any constant k4, we can make this probability less than l/EL’ by choosing Es to be a sufficiently large constant. Hence, with high probability the size of the largest spanning tree of critical nodes in G3 will be log E.

By repeating the same Lo&z Local Lemma calculation that we did before, but this time only considering the packets that have not yet been assigned delays, we know that it is possible to assign delays in each component in such a way that no event becomes critical (when considering only the remaining packets). As before, we assign random delays to the packets one at a time. If any event for an edge g and a frame 7 becomes critical, then we put aside all of the packets that could use the corresponding edge during 7.

After this second pass, we can bound the probability that any connected component of critical and endangered nodes of G contains more than log log E critical nodes. The calculations are exactly the same as before, except that the number of choices for the root of the spanning tree in each subproblem is now log E, rather than E. Thus, the probability that any particular subproblem has a connected component containing t critical nodes is at most (log E)Z(-k~/2-48)‘. Hence, for any constant k4, we can bound the probability that any component contains more than log log E critical

561



nodes by l/log”’ E by making the constant k8 small enough. We can no longer claim that that every component will have at most loglog E critical nodes with high probability, but 1 - (l/ logL’ E) is good enough. If some subproblem ha8 a critical node, then we sim- ply solve it again. With high probability, the total number of solutions computed will be at most twice the number of subproblems.

After the second pass, we can again use the Lov&sz Local Lemma calculation to convince ourselves that it is possible to assign delays in each component in such a way that no event becomes critical (when considering only the remaining packets). At this point, however, it is possible to solve some of the subproblems using exhaustive search. There are two cases to consider, depending on the size of I.

First, suppose that I 2 (log log E)Q, for some fixed constant 0 < a < 1 (which will be determined later). For any particular subproblem of size log log E, the probability of having even a single critical node is at most (log log E)I-(“3/2-48). For any fixed a, where 0 < Q < 1, we can make this quantity less than l/2 be choosing k3 to be large enough. As in the second pass, if any subproblem of size log log E has even a single critical node, we solve it again.

On the other hand, if I < (log log E)“, then we can complete the schedule with a third pass in which we perform an exhaustive search of the possible assignments of the delay8 to the packets in each subproblem of size log log E. Since at most ~(7’ + I) packets pass through the edge associated with any critical node, and there are at most I choice8 for the delay assigned to each packet, the number of different possible assignments for any subproblem containing log log E critical nodes is at most 12*a’o~‘o~E (for P < I, T < 210g2 I). For I < (log log E)a, we can make this quantity smaller than log E@ for any fixed constant j3, 0 < /3 < 1 by making a small enough. Hence we need to try out at most log’ E possible delay assignments.

We can bound the total time taken by the algorithm as follows. In each of the first two passes, we randomly assign a delay to each packet, and then check to see if any of the edges used by that packet have become critical. Since there are N packets, and at most E edge8 to check for each packet, each of the first two passes can be completed in O(NE) time. In the third pass, we solve subproblems containing log log E critical nodes exhaustively if Z 5 (log log E)P. Since there are at most E subproblems of size log log E, and for each subproblem we may examine as many as log@ E assignments of delay, and for each assignment we must

check if any of the (at most) I2 edges used by the (at most) 212 log log E packets are critical, the total time is at most O(E loga E(loglog E)4”+‘). For any fixed c > 0, we can bound this quantity by O(E log’ E) by making a and p small enough. Hence, the total time used by the algorithm is at most O(NE + E logL E).

After the three passes, the total number of packets assigned to any edge in any T-frame is at most

r(I+ T)T ?+(1+c) I 1

where Ci is the number of packets that were assigned delays in the ith pass that could have traversed the edge. Since Ci + Cz + C8 5 r(1 + T), the number of packets that traverse any edge in any T-frame is at most ----- -

r(I + T)T r(l+ T)T Z +3(I+a) I ,

which means that the relative congestion in any T- frame, where log2 I 5 T < 2 log2 I is at most

(yy (1+3a) = r l+ ( ?)(1+34

5 r(l+O(&&

as claimed. cl

4 A parallel scheduling algorithm

At first glance, it seems a8 though the algorithm that was described in Section 3 is inherently sequen- tial. This is because the decision concerning whether or not to assign a delay to a packet is made sequen- tially. In particular, a packet is deferred (i.e., not assigned a delay) if and only if it might be involved in an event that became critical because of the delays assigned to prior packets.

In [l], Alon describes a parallel version of Beck’s algorithm which proceeds by assigning values to all random variables (in this case delay8 to all packets) in parallel, and then unassigning values to those variables that are involved in bad events. The Alon approach does not work in this application because we cannot afford the constant factor blow-up in relative congestion that would result from this process.

Rather, we develop an alternative method for par- allelizing the algorithm. The key idea is to process the packets in a random order. At each step, all packets

562



which do not share an edge with an as-yet-unprocessed packet of higher priority are processed in parallel.

To analyze the parallel running time of this algorithm, we first make a dependency graph G’ with a node for every packet and an edge between two nodes if the corresponding packet8 can be involved in the same event. Each edge is directed towards the node corresponding to the packet of lesser priority. By Brent’s Theorem [4], the parallel running time of the algorithm is then at most twice the length of the longest directed path in G’.

Let D denote the maximum degree of G’. There are at most ZVDL paths of length L in G’. The probability that any particular path of length L has all of its edges directed in the same way is at most 2/L! (the factor of 2 appears because there are two possible orientations for the edges). Hence, with probability near 1, the longest directed path length in G’ is O(D + 1ogN). This is because if L 2 k( D + log N), for some large constant k, then NDL . & << 1.

Each packet can be involved in at most r12 log2 I events, and at most r(l+ 2’) 5 O(1) packets can be involved in the same event. Hence, the degree D of G’ is at most 0(13 log2 I). By using the method of Lemma 2.2 as a preprocessing phase, we can assume that c, d, and thus I, are all polylogarithmic in log N. Hence, the parallel algorithm runs in NC, as claimed.

5 Remarks

The algorithms described in this paper are randomized, but they can be derandomized using the method of conditional probabilities. Furthermore, the running times can be improved to be nearly proportional to the sum of the lengths of the paths taken by the packets.

References

PI

PI

[31

N. Alon. A parallel algorithmic version of the Lo- cal Lemma. Random Structures and Algorithms, 2(4):367-378, 1991.

D. Angluin and L. G. Valiant. Fast probabilistic algorithms for hamiltonian circuits and match- ings. Journal of Computer and System Sciences, 18(2):155-193, April 1979.

J. Beck. An algorithmic approach to the Lo&z Local Lemma I. Random Structures and Algo- rithms, 2(4):343-365, 1991.

PI

[51

PI

I71

PI

PI

WI

Pll

PI

P31

P41

R. P. Brent. The parallel evaluation of gen- eral arithmetic expressions. Journal of the ACM, 21(2):201-208, April 1974.

H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23:493-507, 1952.

P. Erdija and L. LovL. Problems and results on 3-chromatic hypergraphs and some related ques- tions. In A. Hajnal et al., editor, Infinite and Finite Sets. Volume 11 of Colloq. Math. Sot. J. Bolyai, pages 609-627. North Holland, Amster- dam, The Netherlands, 1975.

R. Koch, T. Leighton, B. Maggs, S. Rao, and A. Rosenberg. Work-preserving emulations of fixed-connection networks. In Proceedings of the &Is2 Annual ACM Symposium on Theory of Com- puting, pages 227-240, May 1989.

F. T. Leighton. Introduction to Parallel Algo- rithms and Architectures: Arrays l Trees l Hy- percubes. Morgan Kaufmann, San Mateo, CA, 1992.

F. T. Leighton, B. M. Maggs, A. G. Ranade, and S. B. Rao. Randomized routing and sorting on fixed-connection networks. Journal of Algo- rithms, 17(1):157-205, July 1994.

F. T. Leighton, B. M. Maggs, and S. B. Rae. Packet routing and job-shop scheduling in O(congestion + dilation) steps. Combinatorics, 14(2):167-180, 1994.

T. Leighton. Methods for message routing in parallel machines. In Proceedings of the 24th Annual ACM Symposium on the Theory of Computing, pages 77-96, May 1992.

Y. Mansour and B. Patt-Shamir. Greedy packet scheduling on shortest paths, ja, 14:449-65,1993.

F. Meyer auf der Heide and B. V&king. A packet routing protocol for arbitrary networks. Unpub- lished manuscript., August 1994.

J. Spencer. Ten Lectures on the Probabilistic Method. SIAM, Philadelphia, PA, 1987.


fast algorithms for finding o(congestion+dilation) packet routing

Documents