parallelization of the branch-and-bound algorithm in

13

Upload: others

Post on 15-Apr-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallelization of the branch-and-bound algorithm in

Scientia Iranica A (2016) 23(2), 407{419

Sharif University of TechnologyScientia Iranica

Transactions A: Civil Engineeringwww.scientiairanica.com

Parallelization of the branch-and-bound algorithm intransportation discrete network design problem

A. Zarrinmehr� and Y. Shafahi

Department of Civil Engineering, Sharif University of Technology, Tehran, Iran.

Received 1 March 2014; received in revised form 4 June 2015; accepted 25 July 2015

KEYWORDSTransportationdiscrete networkdesign;Parallel computing;Parallel branch-and-bound algorithm;Master-slaveparadigm.

Abstract. Transportation Discrete Network Design Problem (TDNDP) aims at choosinga subset of proposed projects to minimize the users' total travel time with respect to abudget constraint. Because TDNDP is a hard combinatorial problem, recent research haswidely addressed heuristic approaches and ignored the exact solution. This paper exploreshow application of parallel computation can a�ect the performance of an exact algorithmin TDNDP. First, we show that the Branch-and-Bound (B&B) algorithm, proposedby LeBlanc, is well adapted to a parallel design with synchronized Master-Slave (MS)paradigm. Then, we develop a parallel B&B algorithm and implement it with two searchstrategies of Depth-First-Search (DFS) and Best-First-Search (BFS). Detailed results overup to 16 processing cores are reported and discussed in an illustrative example of theChicago Sketch network. The results suggest an almost linear speedup for both strategies,which slightly drops as more processing cores are added. When using 16 processing cores,the speedup values of 11.80 and 12.20 are achieved for DFS and BFS strategies, respectively.Furthermore, the BFS strategy reveals a very fast parallel performance by �nding theoptimal solution via the minimum computational e�ort.© 2016 Sharif University of Technology. All rights reserved.

1. Introduction

Transportation Discrete Network Design Problem (TD-NDP) emerges in transportation planning as an urbaninfrastructural decision-making problem. It targets theconstruction of new projects, i.e. roads, in an urbanroads network so as to minimize the total travel timeof network users, while holding the budget constraint.Considering a binary decision variable for each project(whether the project is decided to be constructed ornot), TDNDP will be a combinatorial problem witha binary search space, which exponentially enlargesas the number of projects increases. This is why themain body of related research has neglected the exact

*. Corresponding author. Tel.: +98 21 82883386;Fax: +98 21 82884914E-mail addresses: [email protected] (A.Zarrinmehr); [email protected] (Y. Shafahi)

solutions of the problem and addressed the problemthrough heuristic approaches.

The common idea in all heuristic approaches isto achieve a rather high-quality feasible solution in areasonable amount of time [1]. The main drawbackin the application of heuristic algorithms is that theydo not provide insight into the quality of the achievedsolution. This is because the exact solution to theproblem, as a benchmark, is not available in largeinstances. Achieving the exact solution, however, is acomputing intensive problem, which may require morethan 1-year CPU-time, even in moderate TDNDPs [1].As a result, it is important to explore how the appli-cation of parallel computation can contribute to speedup extraction of an exact solution.

Using parallel computing, this article reinvesti-gates the exact solution of TDNDP as a well-knowntransportation problem. To do so, the conventionalBranch-and-Bound (B&B) algorithm, introduced in

Page 2: Parallelization of the branch-and-bound algorithm in

408 A. Zarrinmehr and Y. Shafahi/Scientia Iranica, Transactions A: Civil Engineering 23 (2016) 407{419

the early study of LeBlanc [2], will be considered withtwo standard search strategies of Depth-First-Search(DFS) and Best-Fist-Search (BFS). The algorithmis parallelized using a Master-Salve (MS) paradigm.Application of the algorithm to the Chicago Sketchtransportation network with 12 proposed projects sup-ports that adding more processing cores (up to 16cores in this study) promises a notable reduction incomputation time of the results.

The rest of this paper is organized as follows.Section 2 overviews the background research. Section 3formally describes TDNDP in a bi-level mathematicalprogramming formulation. Some prerequisite conceptsabout B&B algorithms and their parallelization willbe introduced in Section 4. The application of asynchronized MS parallel design to the B&B algorithmin TDNDP is discussed in Section 5, where the parallelalgorithm is presented and some implementation notesare added. The case study is introduced in Section 6and computation results are reported and furtherdiscussed. The paper is concluded and remarks forfuture work are added in Sections 7 and 8.

2. Related research

As a combinatorial problem, TDNDP falls in thecategory of NP-hard complexity class of network designproblems [3]. Combinatorial explosion of such a prob-lem leads to an intractable running-time, which is themain barrier for the application of exact algorithms.As a result, exact solution methods in TDNDP dateback only to a few early studies which were not furtherdeveloped due to their computationally demandingnature [4].

In NP-hard problems like TDNDP, there are var-ious approaches, which can assist in �nding a solutionin a reasonable running-time. One approach is theapplication of heuristics, which is, to trade o� thequality of the found solution against the correspondingrunning-time. In TDNDP, this has been the dominantapproach for which intense literature is available [1,5].However, application of parallel computation is an-other approach which harnesses multiple computationresources to alleviate the computational burden of theproblem [6]. A short overview of both approaches isgiven here.

There have been a variety of approaches, devel-oped in TDNDP literature, to heuristically trade o�the precision of the solution against the computationtime [1]. Some early studies manipulated the problemformulation and obtained approximated solutions forthe problem [4], while others introduced decomposi-tion techniques to overcome the problem size [7,8].Meta-heuristics, as novel heuristics, have also beenextensively used and investigated in recent literature.Various meta-heuristic algorithms have been developed

and compared in TDNDP in categories of genetic, antcolony optimization, simulated annealing, and particleswarm optimization. The interested reader may �ndmore on this topic in a recent review paper [4] and thereferences therein.

Heuristics and meta-heuristics are promising ap-proaches to evade the exponential complexity of TD-NDPs. Although these approaches can reach anappropriate solution in many instances, they do notprovide insight into the goodness of their solution. Asa result, exact solution methods always remain worthnot only to �nd the best solution of the problem, butalso even to test and validate heuristic approaches.

In recent decades, the advent of parallel comput-ing facilities encouraged the exact solution of many NP-hard problems. Promising results have been reportedin the literature for many of such problems [9]. Par-allel algorithms, especially parallel Branch-and-Bound(B&B) algorithms, as a result, were organized in early1980's. These algorithms were further developed dur-ing the last two decades and in various combinatorialproblems [6]. The results were impressive in �nding theexact solution to some problems which had been con-sidered unsolvable for decades. Asymmetric travelingsalesman problems with thousands of cities [10], as wellas instances of the quadratic assignment problem withhundreds of variables [11], are among the problems forwhich the exact solutions were found by devising par-allel branch-and-bound algorithms. Interested readercan �nd more information in [6,9].

In transportation planning discipline, in spite ofthe demanding computational nature of many prob-lems, parallel computing has not received much at-tention so far. Parallelization of the transportationnetwork design problem, in either public transit orurban transportation network design problem, hasbeen the topic of limited research. In case of publictransit network design, no work, to the best of ourknowledge, has been dedicated to addressing a par-allel exact solution. The existing contributions arelimited to the application of parallel meta-heuristicsto the problem. Agrawal and Mathew [12] sug-gested a parallel genetic algorithm for the problemof transit routes network design, while minimizingthe total system costs. They suggested an inher-ent parallelization scheme for the genetic algorithm,in which the �tness function was evaluated, concur-rently, for the solutions. Performance measures werereported for two parallel models of global messagepassing interface and global parallel virtual machine.It was observed that the global parallel virtual ma-chine model would outperform the other one. Yanet al. [13] proposed a model based on paralleliza-tion of the ant colony algorithm for maximizingthe number of direct travelers per unit length in abus network design problem. Parallel performance

Page 3: Parallelization of the branch-and-bound algorithm in

A. Zarrinmehr and Y. Shafahi/Scientia Iranica, Transactions A: Civil Engineering 23 (2016) 407{419 409

measures, however, have not been reported in theirstudy.

In a recent paper, Cooper et al. [14] implementeda parallel multi-objective model based on the geneticalgorithm for minimizing both users' average traveltime and the operator's cost. They suggested appli-cation of an islands approach with a random migrationpolicy to improve the performance of their parallel algo-rithm. In their approach, the population of the geneticalgorithm is split into islands (sub-populations), amongwhich randomly selected solutions are exchanged. Fora transit network with 127 bus stops, when using 24,48, and 96 processing cores, they report speedup valuesof 17.8, 30.1, and 37.0, respectively.

Parallel solutions for urban transportation net-work design problem have also been the topic ofresearch in Zarrinmehr [15] and Zarrinmehr andShafahi [16]. An exact B&B algorithm was undertakenin both of these studies. Zarrinmehr [15] reportedthe parallel performance of a synchronized MS parallelB&B algorithm in a case study of the Chicago Sketchtransportation network. While addressing users' routechoice behavior through system-optimal ows, he re-ported that by using 16 processing cores, the parallelB&B algorithms can achieve speedup values of 10.85and 9.86 in cases of BFS and DFS search strategies,respectively. Zarrinmehr and Shafahi [16] investigatedhow the performance of a parallel B&B algorithmwith DFS strategy in TDNDP could be accelerated.They suggested assigning greedy solutions to the idleprocessors at the start of the B&B algorithm. However,authors reported the performance of their algorithm interms of the total number of parallel iterations ratherthan the real running-times. Their results suggesta potential super-linear speedup in four illustrativeexamples of the Sioux-Falls transportation network.

The real performances of parallel algorithms forrather large transportation networks in TDNDP, toour knowledge, have not been addressed so far in theliterature. This paper explores how parallelizationcan contribute to speed up the solution of a TDNDP.Figure 1 shows how the research roadmap in this studyis narrowed down to �nd an exact solution for thisproblem. As shown in this �gure, the paper willaddress an exact solution for the problem. To achievethat, the paper targets a parallel B&B algorithm basedon the algorithm proposed by LeBlanc [2]. Further,

the corresponding paradigm in this paper will be asynchronous MS parallelization, in which one processorwill hold the main information, generate new tasks, anddistribute them across working processors at prede�nedintervals.

3. Formal description of TDNDP

TDNDP is often known as a problem with a bi-levelprogramming formulation. At the upper level, it seeksan optimum subset of a set of proposed projects (i.e.,roads) in order to minimize the users' total travel timein the network, while at the lower level, it solves aTra�c Assignment Problem (TAP) [1]. To formallydescribe the problem, we use the following notations:

V The set of network vertices;A Set of network links;N(V;A) Current network, which is assumed to

be connected;Ay Set of proposed projects (links) to be

added to the current network;P Set of Origin-Destinations (ODs),

P � A�A;drs The travel demand from r to s,

(r; s) 2 P ;ya Binary decision variable corresponding

to project a, a 2 Ay, taking values 1or 0, indicating whether to construct aproject or not;

y Binary vector of decision variables;Krs The non-empty set of di�erent paths

form r to s, (r; s) 2 P ;A(y) Set of network links after decision y

has been made, A(y) = A [ fa 2 Ay :ya = 1g;

Krs(y) The non-empty set of di�erent pathsform r to s after decision y has beenmade, (r; s) 2 P ;

fk Amount of ow in path k from r to s,k 2 Krs(y);

�ak Binary variable taking values 1 or 0indicating whether link \a" belongs topath k or not, a 2 A(y), k 2 Krs(y);

Figure 1. The scope of this study.

Page 4: Parallelization of the branch-and-bound algorithm in

410 A. Zarrinmehr and Y. Shafahi/Scientia Iranica, Transactions A: Civil Engineering 23 (2016) 407{419

xa Amount of ow in link \a", a 2 A(y);x(y) Vector of tra�c ow on network links

after decision y has been made;ta(xa) Volume-delay function of link a,

a 2 A(y), assumed to be convex anddi�erentiable for xa � 0;

ca The cost related to construction ofproject a, a 2 Ay;

B Available budget for the constructionof given projects.

The Upper Level Problem (ULP) of TDNDP,now, can be written as follows:

MinyT (y) =X

a2A(y)

xata(xa); (1)

s.t. :Xa2Ay

caya � B; (2)

ya = 0 or 1; 8a 2 Ay; (3)

x(y) : user equilibrium ow corresponding

to decision vector y: (4)

The User Equilibrium (UE) in the ULP is a situation inwhich for any OD, all paths have an equal travel timewhich is not greater than those of unused paths [17].This situation is resulted by solving a TAP, which canbe formulated in the Lower-Level Problem (LLP):

[LLP(y)] MinX

a2A(y)

Z xa

u=0ta(u)du; (5)

s.t. :X

k2Krs(y)

fk = drs; 8(r; s) 2 P; (6)

fk � 0; 8k 2 Krs(y); 8(r; s) 2 P; (7)

xa =X

(r;s)2P

Xk2Krs(y)

�akfk; 8a 2 A(y): (8)

Henceforth, the above formulation for TAP will bereferred to as a UE-type TAP. In transportation lit-erature, TAP commonly stands for a UE-type prob-lem. However, while addressing the B&B algorithmof LeBlanc [2], we will also need another kind of TAPthat is formulated to �nd System Optimal (SO) ows,namely a SO-type TAP. The SO-type TAP di�ers fromthe UE-type one only in its objective function. It canbe written as follows [17]:

MinyT (y) =X

a2A(y)

xata(xa);

s.t. : (6); (7); (8): (9)

4. Parallel B&B algorithms in combinatorialproblems

Parallelization is a way to speed up B&B algorithmsin order to �nd the exact solution of combinatorialproblems. Although parallelization cannot cope withthe combinatorial nature of NP-hard problems, theexperience of the past two decades suggests that itcan help solving moderate and rather large problemsthat previously seemed to be unsolvable [6,9,18]. Thissection �rst brie y describes the two main searchstrategies of a B&B algorithm (i.e., BFS and DFS)and, then, introduces some prerequisite concepts aboutparallel B&B algorithms.

4.1. B&B search strategiesThere are two main strategies to traverse a B&B tree,often known as BFS and DFS. These strategies arisefrom the selection rule, with which one node is selectedand removed out of active nodes. If the selectionpriority is given to a node with the least lower boundvalue, the strategy of BFS emerges. Also, if thedeepest node in the B&B tree is given the priority,the corresponding strategy would be called DFS [18].To investigate the performance of di�erent strategies,researchers categorized nodes of the B&B tree (partialsolutions) into three distinct sets [6,9,18]. Let f� be theoptimal objective function of a combinatorial problem;then, categories are as follow:

� Critical nodes: Those nodes with lower bound valuessmaller than f�;

� Undecidable nodes: Those nodes with lower boundvalues equal to f�;

� Eliminable nodes: Those nodes with lower boundvalues greater than f�.

The minimal subset of the B&B tree that mustbe traversed in order to �nd the optimal solution andprove its optimality, called the minimal tree [6,18]. Theminimal tree includes all critical nodes and maybe someundecidable nodes. There is no guarantee that anyB&B algorithm restricts its search to the minimal tree,because the traversed tree is created dynamically in thecourse of the algorithm and it has an irregular structurewhich is not known at the outset [6].

The BFS strategy focuses the search on criticaland undecidable nodes and commonly reveals a robustperformance. The main drawback of the BFS strategyis its required memory which may grow exponentiallyin the worst case. On the other hand, the DFSstrategy may evaluate lots of eliminable nodes andbecome more time-consuming, but it requires muchless memory and can be e�ective in dealing withlarge-size problems. Finally, using a proper strategyalways remains dependent to the problem at hand andavailable resources [6,18].

Page 5: Parallelization of the branch-and-bound algorithm in

A. Zarrinmehr and Y. Shafahi/Scientia Iranica, Transactions A: Civil Engineering 23 (2016) 407{419 411

4.2. Parallelization of B&B algorithmsThere are various approaches to search a B&B treein parallel. In a node-based approach, the relatedcomputation at each node of the B&B tree (e.g., lowerbound calculation) is parallelized over processors. Atree-based approach, on the other hand, is one of themost famous approaches in which processors work ondi�erent parts of the tree, simultaneously. Tree-basedapproaches may have interesting e�ects on the behaviorof the algorithm and, therefore, they have been widelystudied by researchers [18]. Through all various tree-based approaches, this paper focuses on a specialtopic, namely MS paradigm; but, beforehand, somequantitative measures in parallel computing should beintroduced.

Assume that T (1) and T (p) are running-times of aprogram when it is run on 1 and p processors, respec-tively. Then, S(p) as the speedup of parallelizationwith p processors is de�ned as follows [19]:

S(p) =T (1)T (p)

: (10)

Speedup is an indicator of how many times the programwould run faster due to parallelization. According toAmdahl's law in parallel computing, it can be writtenthat [19]:

1 < S(p) < p: (11)

A linear speedup is a speedup that almost equals thenumber of processors. Achieving a linear speedupshows an e�ective parallelization.

In parallel B&B algorithms, Amdahl's law, how-ever, is not always held. This behavior is often knownas an \anomaly" [20]. A super-linear speedup is akind of anomaly in which the speedup exceeds thenumber of processors. Super-linear speedup arises dueto parallelization in B&B algorithms when an earlyaccess of processors to a good solution brings aboutdeletion of further active nodes and reduction of thesearch in the B&B tree [21].

Studying speedup bounds is not trivial in par-allel B&B algorithms due to complexities and non-deterministic behaviors. Nevertheless, researchers havestudied, theoretically, a few parallel paradigms. Asynchronized MS paradigm is one of these paradigms,which can be brie y described as follows [18]: A mainprocessor, namely the master, holds the importantinformation, e.g. the B&B tree, the incumbent, etc.The master distributes active nodes as jobs amongother processors, namely the slaves (which is a MSparadigm). The slaves receive the nodes, evaluaterelated bounds, and return the results to the mas-ter at prede�ned intervals (which is a synchronizedparadigm). The master then updates its informationand generates new jobs if needed.

The theoretical study of synchronized MSparadigm was accompanied with 2 assumptions foranalysis simpli�cation:

� The master-slave data exchange is done in a zerotime;

� All slaves evaluate received nodes in a uniformamount of time (known as grain-size uniformity).

With the above assumptions, if I(1) and I(p)stand for the number of B&B iterations with 1 andp processors, respectively, then the real-case speedup,S(p), can be estimated by the theoretical speedup,St(p), as follows [20,22,23]:

S(p) ' St(p) =I(1)I(p)

: (12)

Theoretical studies of anomalous behaviors in parallelB&B algorithms with MS paradigm were organized inthe middle of 80s. These studies proved that anomalousbehaviors stem from the ambiguity in the selection ofundecidable nodes when they emerge as selection ties.As a result, one can say that Amdahl's law (previouslypresented in Inequality (11)) is theoretically held inB&B trees with empty set of undecidable nodes.

5. Methodology

This section discusses how a synchronized MS paral-lelization paradigm is adapted for parallelization of theB&B algorithm in TDNDP and presents a owchartand corresponding implementation notes for the par-allel algorithm. But we require, beforehand, a quickoverview of LeBlanc's sequential B&B algorithm, towhich will be referred in next subsections.

5.1. The sequential algorithmAs previously mentioned, this paper addressesLeBlanc's B&B algorithm as the basic sequential al-gorithm to be parallelized. To exactly solve TDNDP,LeBlanc organized a B&B algorithm for TDNDP, inwhich the given projects were decided whether to beconstructed or not at subsequent levels of a binary tree.Assuming n to be the number of given projects, theB&B tree, introduced by LeBlanc, is a rooted binarytree with n levels in which, at the root, all projects areinitially undecided. The decision whether to constructproject i or not (i = 1; 2; :::; n) is made at level i of thetree. Figure 2 shows the structure of the tree, in whicheach node is coded by a string of n binary variables.Symbols 1, 0, or - at the ith place respectively indicatethat the project i is decided whether to be constructed,not to be constructed, or yet undecided (as what is thecase for partial solutions at levels 1 to n� 1).

LeBlanc's B&B algorithm iteratively decomposespartial solutions and when a budget-wise feasible and

Page 6: Parallelization of the branch-and-bound algorithm in

412 A. Zarrinmehr and Y. Shafahi/Scientia Iranica, Transactions A: Civil Engineering 23 (2016) 407{419

Figure 2. The structure of the B&B tree in TDNDP inLeBlanc's algorithm.

complete solution emerges, solves a UE-type TAP toevaluate it [2]. To achieve a lower bound on partialsolutions, LeBlanc also relaxed Constraints (2) and(4), added Constraints (6), (7), and (8) to the ULP;and proved that the objective function of the emergingproblem would work as a lower bound. This lowerbound can be calculated by solving a SO-type TAPin which all undecided projects are assumed to beconstructed [2]. Figure 3 provides a owchart forLeBlanc's B&B algorithm. In this �gure, the term nodestands for a B&B tree node and the terms right and left

children are de�ned in the same way as that shown inFigure 2.

5.2. Application of a synchronized MSparallelization paradigm on the problem

A synchronized MS paradigm of parallelization welladapts LeBlanc's B&B algorithm in TDNDP. Figure 4shows this paradigm in a diagram. According to thediagram, the master processor holds the main infor-mation of the B&B algorithm. Each parallel iterationstarts with generating new nodes to be evaluated.The slave processors that have already maintained thenetwork information receive the newly generated nodesin the form of binary string codes. Each slave processordecodes the received binary string, performs a tra�cassignment on the corresponding network, and sends itsevaluation back to the master. The parallel iteration�nishes as the master receives the evaluation data andupdates the incumbent and active nodes.

Choosing an appropriate parallelization paradigmfor a problem is highly a�ected by both communicationand idling overheads in the parallel program [19]. Twomain reasons, in this paper, support the application ofa synchronized MS paradigm:

� Communication overhead due to a centralizedparadigm is negligible;

Figure 3. Flowchart for LeBlanc's B&B algorithm.

Page 7: Parallelization of the branch-and-bound algorithm in

A. Zarrinmehr and Y. Shafahi/Scientia Iranica, Transactions A: Civil Engineering 23 (2016) 407{419 413

Figure 4. Diagram of parallelization.

� Idling overhead due to a synchronized paradigm isnot much.

5.2.1. Communication overheadIn parallel computing, the communication betweenprocessors usually results in communication overhead.This is the case particularly for MS paradigms whenthe access to the master may become a bottleneck [18].Communication overhead is totally dependent on thetask grain size of processors and the amount of dataexchanged between them [19]. The parallel B&Balgorithm in this paper has a coarse-grain structure,because processors will solve a TAP at each paralleliteration, which is a rather time-consuming proce-dure.

Furthermore, data exchange between processorswill be restricted to only sending and receiving thebinary strings and related evaluations at each iteration.To achieve this, all the required information (network,demand matrix, and projects) will initially be loadedover slave processors. Afterwards, each slave processorreceives a binary string as a solution to be evaluated,adjusts the underlying network with the received infor-mation, solves a TAP over that, and sends its evaluatedvalue back to the master.

5.2.2. Idling overheadIdling overhead often takes place in synchronized par-allelization when processors must communicate at pre-de�ned intervals and idling time may be incurred tosome processors due to synchronization [18]. Unifor-mity of the grain-size, therefore, plays an importantrole in alleviating idling overhead. In the parallel B&Balgorithm used in this paper, we assign each processorto solve one TAP (i.e., evaluate one node) at each iter-ation. The computational complexity of TAPs mainlydepends on the network size and con�guration, whichremains almost unchanged in the real networks by

Figure 5. Lower bound evaluation in the B&B tree.

adding or deleting a couple of projects. However, therestill remains another problem to parallelize LeBlanc'salgorithm with a uniform grain-size.

Let us de�ne a node expansion in LeBlanc's B&Btree as creation and evaluation of feasible children ofone node. Then, the potential problem occurs whenthere is a di�erence between expanding nodes of levels1; 2; :::; n�2 and those of level n�1. For nodes of levels1; 2; :::; n � 2, the expansion can be done by exactlysolving one TAP as shown in Figure 5. This is becauseafter assuming undecided projects to be constructed(required for lower bound calculation), the right child,which takes the new decision of construction, will havethe same lower bound as that of its father and needsno further bound calculation. Therefore, expansion ofsuch nodes is computationally equivalent to solving oneTAP only for the left child.

The expansion of nodes of level n � 1, however,may require solving two TAPs rather than one. Bothchildren of those nodes, if feasible, are complete so-lutions for which a UE-type TAP must be solved forevaluation. This problem opposes the uniformity of thegrain-size in our parallel algorithm, but can be tackledby a simple modi�cation in the B&B tree, shown inFigure 6.

According to the modi�cation shown in Figure 6,for any node of level n� 1, if the right child thereof isalso feasible (Figure 6(a)), we duplicate the nodes each

Page 8: Parallelization of the branch-and-bound algorithm in

414 A. Zarrinmehr and Y. Shafahi/Scientia Iranica, Transactions A: Civil Engineering 23 (2016) 407{419

Figure 6. The B&B tree at level n� 1.

for one child (Figure 6(b)). The priority is then greedilygiven to the right child with one more constructedproject. Therefore, in the modi�ed tree, expansion ofthe nodes of level n � 1, as well as the nodes of levels1; 2; :::; n � 2, will computationally equal to one TAPsolution and at the same time, the grain-size uniformityof parallel algorithm is also preserved.

5.3. Parallelization of the sequential algorithmThe focus of this paper in parallelization of LeBlanc'sB&B algorithm is on the stage of node expansion,namely parallel nodes expansion. The parallel B&Balgorithm can be viewed from two points: the masterprocessor and the slaves. Figure 7 represents owchartsof the parallel algorithm for both master and slaves.Most steps in this �gure are similar to those in Figure 3.The main di�erences are at the stages of parallel nodesexpansion, slaves receiving massages and working onthem, and the termination part where a parallel featureis applied to the algorithm.

5.4. Implementation notesHere, we add some implementation notes about thealgorithm:

� The parallel algorithm was implemented in Javaprogramming language and the library MPJ wasexploited for parallelization;

Figure 7. Flowcharts for the parallel B&B algorithm.

Page 9: Parallelization of the branch-and-bound algorithm in

A. Zarrinmehr and Y. Shafahi/Scientia Iranica, Transactions A: Civil Engineering 23 (2016) 407{419 415

Table 1. De�nition of projects in the Chicago Sketch network.

Projectnumber

Fromvertex

Tovertex

fftta(minutes)

ca(vehicles/hour)

ba(units of budget)

1 29 28 3.52 12000 162 3 5 3.73 12000 163 397 403 4.65 12000 174 605 399 5.37 12000 185 362 360 7.20 12000 206 407 681 7.55 12000 227 550 564 7.73 12000 228 580 568 10.10 12000 229 766 779 10.61 12000 2310 42 50 14.20 12000 2611 250 217 17.64 12000 2912 527 529 18.03 12000 33

� TAPs were solved using the convex combinationalgorithm [17];

� The B&B algorithm was implemented with twostrategies of DFS and BFS. This was done inthe master processor by giving the correspondingselection priorities when a node was selected andremoved from active nodes;

� Since the TAP brought about a coarse-grain par-allelization, especially for real-life transportationnetworks, the master would experience a notableidling overhead. Therefore, the master was alsodesignated to solve a TAP in conjunction withslaves;

� The program was �nally run on two clusters, eachwith 2 CPUs of type Intel Xeon E5504 2.00 GHz.With 4 processing cores at each CPU, the totalnumber of 16 processing cores were available toparallelize the program.

6. Computational performance

6.1. Case study de�nitionThe case study of this paper, over which parallel perfor-mance and results of the algorithm are reported, is theChicago Sketch network. The Chicago Sketch networkis an aggregated, but yet a rather large, representationof the Chicago region network [24]. It contains 933vertices, 2950 links, and 93513 non-zero trip demandvalues in the OD matrix. The data for demand inthe Chicago Sketch network provides very low levelsof congestion, which may not be naturally subjected toa TDNDP. The overall demand matrix, as a result, isinitially doubled (suggested in Bar-Gera [24]) to makethe underlying network more congested and realisticfor design applications.

In order to introduce a set of projects to select

from, 12 arti�cial projects were proposed following thepattern of network links. The volume-delay functionsfor projects, as well as network links, follow thetraditional BPR format:

ta(xa) = fftta

1 + 0:15

�xaca

�4!; (13)

in which:fftta is the free- ow travel time on project a

(minutes),ca is the practical capacity of project a

(vehicles/hour),ta(xa) is the travel time on project a

(minutes).Table 1 presents detailed information of the projectsbased on the BPR function. In this table, ba also standsfor the budget (units of budget) required to constructproject a. The total available budget is assumed tobe 100 (units of budget), which is a mid-size budgetlevel. Although the capacity of 12000 (vehicles/hour)sounds rather large for the projects, it is not yet beyondthe capacity range of the existing links of the ChicagoSketch network. In this network, there are morethan 30 non-connector links, for which the practicalcapacity is 12000 or even more vehicles/hour [24].Addressing projects with low-capacities, however, mayrequire application of more accurate TAP solutionalgorithms compared to the classic convex-combinationalgorithm (Frank-Wolfe algorithm) used in this pa-per [25].

6.2. Numerical resultsBased on the parallel synchronized MS algorithmpreviously introduced, the program was run with twostrategies of DFS and BFS. Related results for 1 to 16

Page 10: Parallelization of the branch-and-bound algorithm in

416 A. Zarrinmehr and Y. Shafahi/Scientia Iranica, Transactions A: Civil Engineering 23 (2016) 407{419

Table 2. Parallel computation results.

(a) DFS strategyNumber ofprocessing

Total numberof evaluated

Number of evaluated nodes oftype

Total numberof parallel

Running-time,T (p) (minutes)

cores1, p nodes Critical Undecidable Eliminable iterations, I(p)

1 1013 767 0 246 1013 271.712 1013 767 0 246 508 144.834 1019 767 0 252 258 76.498 1029 767 0 262 132 42.5716 1062 767 0 295 69 23.02

(b) BFS strategyNumber ofprocessing

Total numberof evaluated

Number of evaluated nodes oftype

Total numberof parallel

Running-time,T (p) (minutes)

cores1, p nodes Critical Undecidable Eliminable iterations, I(p)

1 767 767 0 0 767 208.092 767 767 0 0 384 110.084 767 767 0 0 193 56.718 767 767 0 0 99 31.3916 767 767 0 0 53 17.06

1This number includes the master processor as well, as previously mentioned in Subsection 5.4.

Table 3. Maximum number of active nodes for strategiesof DFS and BFS.

Number ofprocessing cores, p

The maximum numberof active nodes

DFS BFS

1 6 2322 12 2304 22 2298 40 22916 76 230

processing cores are shown in Table 2. It must be notedthat the data is not known in advance and it will beextracted after running the programs.

The required memory for both strategies of DFSand BFS can also be estimated by the maximumnumber of active nodes during the program execution,as represented in Table 3.

After running the program, the optimal solu-tion, as a binary string of length 12, was found[000100111000] with the objective function of 65045456vehicle-minutes.

6.3. Speedup, anomalous behavior, andmemory analysis

Based on the results of Table 2, the program with theBFS strategy reveals a notably faster performance thanthat of the strategy DFS. As Table 2(a) suggests, thereare many eliminable nodes in the traversal of DFSstrategy. As a result, much more parallel iterations

Figure 8. The running-time of parallel programs.

in the DFS strategy are required to evaluate theseextra nodes. According to Table 2(b), the numberof eliminable nodes for the BFS strategy remainszero even when the number of processors increasesto 16. This is why the strategy BFS reveals a notablyfaster performance than the strategy DFS, as shown inFigure 8, in which running-time curves are plotted forboth strategies.

As previously discussed in theoretical studiesabout parallel B&B algorithms, anomalous behaviors(that contradict with Amdahl's law) may occur whenthere are selection ties over undecidable nodes in theB&B tree [20]. In our TDNDP instance, however,Table 2 suggests that there is no undecidable node forboth parallel strategies of BFS and DFS. As a result,Amdahl's law theoretically holds for the synchronized

Page 11: Parallelization of the branch-and-bound algorithm in

A. Zarrinmehr and Y. Shafahi/Scientia Iranica, Transactions A: Civil Engineering 23 (2016) 407{419 417

Figure 9. Speedup curves in the Chicago Sketch case study.

MS parallelization of the B&B algorithm in TDNDPinstance of this paper.

Con�ning the search to the minimal B&B treeis other interesting information that can be extractedfrom Table 2(b) for the BFS strategy. As shown inthis table, the BFS strategy traverses neither of theeliminable nodes nor the undecidable ones. This clearlyadapts to what has been previously introduced as theminimal tree in B&B algorithms. It is noteworthy thateven for 16 parallel processing cores, this behavior inachieving the solution through the minimum computa-tional e�ort is consistent.

The required memory for strategies DFS and BFScan be compared using Table 3. The required memoryfor the DFS strategy is much lower than that for theBFS. It increases linearly at �rst, but slightly dropsfor higher values of p. For the BFS strategy, thisbehavior seems quite di�erent. The required memoryfor this strategy is rather high, but it remains almostunchanged by addition of more processing cores.

Speedup curves are the other information. Thereal speedup, S(p), and theoretical speedup, St(p),(introduced, respectively, in Eqs. (10) and (12)) arecalculated in Table 4. Corresponding curves forstrategies DFS and BFS are also shown in Figure 9.This �gure indicates that real speedup values arelower than the theoretical ones. This gap is mainlydue to a synchronized design, in which a short lag

Table 4. Speedup results for the strategies of DFS andBFS in the Chicago Sketch case study.

Number ofprocessing

DFS speedupvalues

BFS speedupvalues

cores, p St(p) S(p) St(p) S(p)

1 1.00 1.00 1.00 1.002 1.99 1.88 2.00 1.894 3.93 3.55 3.97 3.678 7.67 6.38 7.75 6.6316 14.68 11.80 14.47 12.20

in one processing core is incurred to all of themas an idling overhead. Although the gap betweenreal and theoretical speedups, shown in Figure 9,slightly increases for higher values of p, both strategies,DFS and BFS, reveal a notable increment in theirreal speedups by adding more processors up to 16processing cores in this study. Both strategies achievea real speedup of almost 12 (12.20 and 11.80 for BFSand DFS, respectively) when 16 processing cores areused for parallelization. These are slightly more than,but still similar to, the speedup values reported byZarrinmehr [15] (10.85 for BFS and 9.86 for DFSstrategies) who solved the problem with a normaldemand (not doubled as in this study) and a system-optimal ow pattern.

7. Conclusion

This paper focused on a parallel exact solution for TD-NDP as a computing-intensive problem in transporta-tion literature. Based on LeBlanc's B&B algorithm,we discussed that a synchronized MS parallelizationparadigm adapted well to the solution algorithm due toprocessors' negligible communication and little idlingoverhead. The parallel B&B algorithm was developedand implemented in Java with search strategies of BFSand DFS. In a TDNDP instance with 12 projects,parallelization results over up to 16 processing coresin the Chicago Sketch network supported that:

� Due to traversing through B&B tree in the minimalcomputational e�ort, the parallel B&B algorithmwith BFS strategy attained a notable faster perfor-mance than that of DFS. Consequently, the run-timefor BFS strategy when using 16 processing cores was17.06 minutes while it was 23.02 minutes for thestrategy DFS;

� Both parallel strategies of BFS and DFS achievedan almost linear speedup, which slightly droppedby adding more processing cores. When using 16processing cores, the achieved speedup values were

Page 12: Parallelization of the branch-and-bound algorithm in

418 A. Zarrinmehr and Y. Shafahi/Scientia Iranica, Transactions A: Civil Engineering 23 (2016) 407{419

12.20 and 11.80 for the BFS and DFS strategies,respectively;

� The proposed algorithms were not subjected to ananomalous behavior of parallel B&B algorithms;

� The required memory for the strategy DFS increasedalmost linearly and proportional to the number ofprocessing cores. The BFS strategy, on the otherhand, required much more memory. However, thismemory remained almost consistent as the numberof cores changed.

8. Future work

To further extend the results of this paper, it isinteresting to run the parallel B&B algorithms on mas-sive parallelization facilities. In that case, applicationof other parallelization paradigms (e.g., asynchronousparadigms) may further reduce the idling and commu-nication overheads and result in a more scalable paralleldesign. Sensitivity analysis over budget, demand level,network, and size of the projects is also a potential di-rection for future research. It would also be interestingto apply novel e�cient TAP features, which can allowfor enhancing the run-time as well as addressing low-capacity projects in the problem.

Acknowledgment

Authors wish to thank Prof. Hossain Poorzahedy forproviding valuable comments during the research aswell as Morteza Parvizi Omran and Saied Zarrinmehrfor their generous cooperation. The IPM Grid Comput-ing Group is sincerely acknowledged for providing facil-ities to run parallel programs. Authors also gratefullythank two anonymous referees and the editorial o�ceof Scientia Iranica for providing constructive commentson this paper.

References

1. Poorzahedy, H. and Abulghasemi, F. \Application ofant system to network design problem", Transporta-tion, 32(3), pp. 251-273 (2005).

2. LeBlanc, L.J. \An algorithm for the discrete networkdesign problem", Transport. Sci., 9(3), pp. 183-199(1975).

3. Ben-Ayed, O., Boyce, D.E. and Blair, C.E. \A generalbilevel linear programming formulation of the networkdesign problem", Transport. Res. B-Meth., 22(4), pp.311-318 (1988).

4. Poorzahedy, H. and Turnquist, M.A. \Approximatealgorithms for the discrete network design problem",Transport. Res., 16B, pp. 45-55 (1982).

5. Farahani, R.Z., Miandoabchi, E., Szeto, W.Y. andRashidi, H. \A review of urban transportation network

design problems", Eur. J. Oper. Res., 229(2), pp. 281-302 (2013).

6. Roucairol, C. \Parallel processing for di�cult combi-natorial optimization problems", Eur. J. Oper. Res.,92(3), pp. 573-590 (1996).

7. Dantzig, G.B., Harvey, R.P., Lansdowne, Z.F., Robin-son, D.W. and Maier, S.F. \Formulating and solvingthe network design problem by decomposition", Trans-port. Res. B-Meth., 13(1), pp. 5-17 (1979).

8. Solanki, R.S., Gorti, J.K. and Southworth, F. \Usingdecomposition in large-scale highway network designwith a quasi-optimization heuristic", Transport. Res.B-Meth., 32(2), pp. 127-140 (1998).

9. Crainic, T.G., Le Cun, B. and Roucairol, C. \Paral-lel branch-and-bound algorithms", In Parallel Comb.Optim., John Wiley & Sons, pp. 1-28 (2006).

10. Pekny, J.F. and Miller, D.L. \A parallel branch andbound algorithm for solving large asymmetric travelingsalesman problems", Math. Program., 55(1-3), pp. 17-33 (1992).

11. Anstreicher, K., Brixius, N., Goux, J.P. and Linderoth,J.T. \Solving large quadratic assignment problems oncomputational grids", Math. Program., 91(3), pp. 563-588 (2002).

12. Agrawal, J. and Mathew, T.V. \Transit route networkdesign using parallel genetic algorithm", J. Comput.Civil Eng., 18(3), pp. 248-256 (2004).

13. Yang, Z., Yu, B. and Cheng, C. \A parallel ant colonyalgorithm for bus network optimization", J. Comput.Aided Civil Infrastruct. Eng., 22(1), pp. 44-55 (2007).

14. Cooper, I.M., John, M.P., Lewis, R., Mumford, C.L.and Olden, A. \Optimising large scale public trans-port network design problems using mixed-mode par-allel multi-objective evolutionary algorithms", IEEECongress on Evolut. Comput. (CEC), Beijing, China,pp. 2841-2848 (2014).

15. Zarrinmehr, A. \Discrete network design using parallelbranch-and-bound algorithms", M.Sc. Thesis, Dept. ofCivil Eng., Sharif University of Technol., Tehran, Iran(2011).

16. Zarrinmehr, A. and Shafahi, Y. \Accelerating theperformance of parallel depth-�rst-search branch-and-bound algorithm in transportation network designproblem", Proceedings of the Int. Conf. on Oper. Res.and Enterprise Sys. (ICORES), pp. 359-366 (2015).

17. She�, Y., Urban Transportation Networks: Equilib-rium Analysis with Mathematical Programming Meth-ods, In Prentice Hall, Englewood Cli�s, NY, USA(1985).

18. Gendron, B. and Crainic, T.G. \Parallel branch-and-branch algorithms: Survey and synthesis", Oper. Res.,42(6), pp. 1042-1066 (1994).

19. Barney, B., Introduction to Parallel Computing, InLawrence Livermore National Library (2010).

20. Lai, T.H. and Sahni, S. \Anomalies in parallel branch-

Page 13: Parallelization of the branch-and-bound algorithm in

A. Zarrinmehr and Y. Shafahi/Scientia Iranica, Transactions A: Civil Engineering 23 (2016) 407{419 419

and-bound algorithms", Commun. ACM, 27(6), pp.594-602 (1984).

21. Pruul, E., Nemhauser, G. and Rushmeier, R. \Branch-and-bound and parallel computation: A historicalnote", Oper. Res. Lett., 7(2), pp. 65-69 (1988).

22. Li, G.J. and Wah, B.W. \Coping with anomaliesin parallel branch-and-bound algorithms", IEEE T.Comput., 100(6), pp. 568-573 (1986).

23. Li, G.J. and Wah, B.W. \Computational e�ciencyof parallel combinatorial or-tree searches", IEEE T.Software Eng., 16(1), pp. 13-31 (1990).

24. Bar-Gera H. \Transportation network test problems",http://www.bgu.ac.il/ bargera/tntp (2015).

25. Boyce, D., Ralevic-Dekic, B. and Bar-Gera, H.\Convergence of tra�c assignments: How much isenough?", J. Transp. Eng., 130(1), pp. 49-55 (2004).

Biographies

Amirali Zarrinmehr started his undergraduate stud-ies in Computer Engineering and continued his BS

in Civil Engineering at University of Tehran, Tehran,Iran. He earned an MS degree in TransportationPlanning from Sharif University of Technology, Tehran,Iran. He is currently a PhD candidate of Highway andTransport Engineering at Tarbiat Modares Universityof Tehran, Tehran, Iran. His main research interestsare design and application of combinatorial algorithmsarising in transportation discipline. He is currentlyvisiting the Northwestern University, IL, USA, as a pre-doctoral sabbatical scholar.

Yousef Shafahi received his BS from Shiraz Uni-versity, Shiraz, Iran, MS from Isfahan University ofTechnology, Isfahan, Iran, and PhD from Universityof Maryland, College Park, MD, USA. He joined theDepartment of Civil Engineering at Sharif Universityof Technology, Tehran, Iran, as an Assistant Professorin 1998 and is now professor and head of transportationdivision in this department. His research interestsare: urban transportation planning; road, rail, and airtransportation; transit networks design and scheduling;applications of operation research; soft computing; andsimulation in transportation.