parallelization of dijkstra's algorithm

Programming project Dijkstra's algorithm

Summer school: ' Introduction to High performance Computing

Center for High-Performance Computing & KTH School of Computer

Science and Communication

Salvator Gkalea Dionysios Zelios

Department ICT/ES Department of Physics

KTH Stockholm University

[email protected] [email protected]

Tutor: Stefano Markidis

Date: October 2014

Contents Abstract ..................................................................................................................................... 3

Introduction ............................................................................................................................... 3

Dijkstra's Parallelization ............................................................................................................ 5

Simulation results and analysis ................................................................................................. 8

References ............................................................................................................................... 14

Abstract Dijkstra's algorithm is a graph search algorithm that solves the single-

source shortest path problem for a graph with non-negative edge path

costs, producing a shortest path tree. This algorithm is often used in

routing and as a subroutine in other graph algorithms. In this project we

investigate the parallelization of this algorithm and its speedup against

the sequential one. Message Passing Interface (MPI) is used for this

purpose, in order to parallelize the source-shortest path algorithm.

Introduction Dijkstras algorithm determines the shortest path from a single source

vertex to every other vertex. It can also be used for finding costs of

shortest paths from a single vertex to a single destination vertex by

stopping the algorithm once the shortest path to the destination vertex

has been determined. For instance, if the vertices of the graph represent

cities and edge path costs represent driving distances between pairs of

cities connected by a direct road, Dijkstra's algorithm can be used to find

the shortest route between one city and all other cities.

A pseudo-code representation of the Dijkstra's sequential single-source

shortest-paths Algorithm is given. The pseudo-algorithm proceeds as

follows:

1. function Dijkstra(V, E, w, s)

2. begin

3. VT := {s};

4. for all v exists in (V - VT) do

5. if (s, v) exists

set l[v] := w(s, v);

6. else

set l[v] := infinitive;

end for

7. while VT V do

8. begin

9. find a vertex u such that l[u] :=

min{l[v]|v exists in (V - VT)};

10. VT := VT joint {u};

11. for all v exists in (V - VT) do

12. l[v] := min{l[v], l[u] + w(u, v)};

13. end while

14. end Dijkstra

Suppose we have a given weighted graph G = (V,E,w), where V =

vertices, E = edges, w = weight. The pseudo-code above represents the

procedure where the shortest path, from vertex s to every other vertex

v, is computed. For every vertex v that exists in (V - VT), the algorithm

stores the minimum cost to reach vertex v from vertex s. In line 6-7 the

procedure stores every weight of each vertex to the l[] array. In the rest

of the lines, the algorithm finds the closest vertex u to vertex s , based

on the weight, among all those not yet examined and then updates the

minimum distance ( l[] array ).

The main parallelization idea to approach this problem is to

assume that every process can discover its node that is closest to the

source. Then, it is supposed to make a reduction in order to distinguish

the closest node and upon success it will broadcast the selection of this

node to all the other processes. At the end each process will update

locally its distant array ( l[] array ).

The parallel computing can be achieved with some basic MPI communication functions. These functions are described below and they are going to be used to parallelize the single-source shortest-paths Algorithm.

MPI_Init Initialize the parallel environment

MPI_Finalize End the parallel environment, release system resources

MPI_Comm_size Returns the number of parallel computing units

MPI_Comm_rank Returns the current calculation unit identification nimber

MPI_Reduce Reduces values on all processes to a single value

MPI_Bcast Broadcasts a message from the process with rank "root" to all other processes of the communicator

MPI_Gather Gathers together values from a group of processes

Dijkstra's Parallelization The pseudo-code mentioned above can be parallelized based on the line 9 in which the minimum distance among all the nodes is computed or based on the line 11 where the algorithm calculates the minimum overall distance and stores it to the local minimum distance array. The classic approach is to break down the data that need to be computed into segments. Then each node is responsible to process and compute a segment of data. At the end all the results from all the segments must be gathered at a node. The C code above demonstrates the logic mentioned above. #include

#include

int numV, //number of vertices

*todo, //vertices to be analysed

numNodes, //number of nodes

chunk, //number of vertices handled by every node

start, end, //start, end point vertex for each node

myNode; //ID of node

unsigned maxInt,

localMin[2], // [0]: min for local node, [1]: vertex

for that min

globalMin[2], // [0]: global min for all the node,

[1]: vertex for that min

*graph, //hops between vertices

*minDistance; // min distance

double T1, T2; // start, end time

// Generation of the Graph

void init(int ac, char **av) {

int i, j;

unsigned u, tmp;

numV = atoi(av[1]);

MPI_Init(&ac, &av);

MPI_Comm_size(MPI_COMM_WORLD, &numNodes);

MPI_Comm_rank(MPI_COMM_WORLD, &myNode);

chunk = numV / numNodes;

start = myNode * chunk;

end = start + chunk - 1;

maxInt = -1 >> 1;

graph = malloc(chunk * numV * sizeof(int));

minDistance = malloc(numV * sizeof(int));

todo = malloc(numV * sizeof(int));

for (i = 0; i < chunk; i++) {

for (j = i; j < numV; j++) {

if (j == i)

graph[i][j] = 0;

else {

graph[i][j] = rand(1) % 21 + 1;

graph[j][i] = graph[i][j];

}

}

}

for (i = 0; i < numV; i++) {

todo[i] = 1;

minDistance[i] = maxInt;

}

minDistance[0] = 0;

}

int main(int ac, char **av) {

int i,j,k;

//Initialization process

init(ac, av);

if (myNode == 0)

T1 = MPI_Wtime();

for (step = 0; step < numV; step++) {

// Find local minimum distance

for (i = start; i

Simulation results and analysis The experiments have been conducted on Milner with the Intel

compiler.

Time measurements

First, we run our algorithm for different values of meshes. Each time we vary the number of cores and we get as a result the time needed for the algorithm to run. The measurements are shown in the table below and were executed as aprun n Nodes ./Dijkstra.out mesh, where Nodes=132 and mesh=the size of the graph (ex. Mesh=4*106 means the number of vertices) mesh= 4*106, Nodes=1..32 Nodes 1 2 4 8 16 32

Time (sec)

0.009083

0.010213

0.007118

0.008694

0.015567

0.034436

mesh= 64*106 Nodes 1 2 4 8 16 32

Time (sec)

0.14132 0.150809

0.092254

0.062077 0.105193

0.1638634

mesh= 256*106 Nodes 1 2 4 8 16 32

Time (sec)

0.564806

0.597349

0.352130

0.202681

0.189242

0.200911

mesh=900*106 Nodes 1 2 4 8 16 32

Time (sec)

1.984156

2.077290

1.213774

0.874406

0.85756

1.056482

Plotting now the aforementioned values in the same diagram, we have:

It is obvious that the running time is decreasing as the number of cores increases until it reaches the smallest value, then the running time will increase because of the communication latency.

However, for middle size network the phenomenon of a reducing running time is not that obvious. For a small size network, the running

time is even increasing as the number of cores increases, because the communication latency outperforms the benefit from using more cores.

Speed up measurements

In our next step, we want to investigate the speed up of our algorithm. In order to do that, we divided the base time in one node with time needed for multiple nodes to be executed. The measurements and the corresponding plot are given below: mesh= 4*106 Nodes 1 2 4 8 16 32

Time (sec)

1

0.889356 1.276060 1.044743 0.583477 0.263764

mesh= 64*106 Nodes 1 2 4 8 16 32

Time (sec)

1 0,9370793

1,531857

2,27652 1,3434353 0,862425

mesh= 256*106 Nodes 1 2 4 8 16 32

Time (sec)

1

0,945520 1,60397 2,78667 2,98457 2,81122

mesh=900*106 Nodes 1 2 4 8 16 32

Time (sec)

1

0,9551 1,63469 2,269147 2,3137 1,8780

The speed up is increasing as the number of cores increases until it reaches the maximum value, then the speed up is decreasing.

This is happening because more cores are used. The speed up is decreasing because the communication latency outperforms the benefit from using more cores.

As the network size increases, the number of cores used to get the maximum speed up increases.

Cost measurements

In addition, we would like to investigate the cost in comparison with the

number of cores that we have used. In order to do that, we multiplied

the number of cores with the execution time in one core. The

measurements and the corresponding diagram are presented below:

mesh= 4*106 Nodes 1 2 4 8 16 32

Time (sec)

1,87807 0,020426 0,028472 0,069552 0,249072 1,101952

mesh= 64*106 Nodes 1 2 4 8 16 32

Time (sec)

0,14132 0,301618 0,369016 0,496616 1,683088 5,2436288

mesh= 256*106 Nodes 1 2 4 8 16 32

Time (sec)

0,564806 1,194698 1,40852 1,621448 3,027872 6,429152

mesh=900*106 Nodes 1 2 4 8 16 32

Time (sec)

1,984156 4,15458 4,855096 6,995248 13,72096 36,97687

The cost is increasing because the speed up (or the benefit of a reduced running time) cannot outperform the cost of using more cores.

References

1) Wikipedia: http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm

2) Lecture notes by Erwin Laure:

http://agenda.albanova.se/conferenceDisplay.py?confId=4384

3)Han, Xiao Gang, Qin Lei Sun, and Jiang Wei Fan. "Parallel Dijkstra's

Algorithm Based on Multi-Core and MPI." Applied Mechanics and

Materials 441 (2014): 750-753.

4) http://www.mpich.org/

5) http://www.open-mpi.org/

6) Crauser, Andreas, et al. "A parallelization of Dijkstra's shortest

path algorithm."Mathematical Foundations of Computer Science

1998. Springer Berlin Heidelberg, 1998. 722-731

7) Meyer, Ulrich, and Peter Sanders. "-stepping: A parallel single

source shortest path algorithm." AlgorithmsESA98. Springer

Berlin Heidelberg, 1998. 393-404.

8)http://www.inf.ed.ac.uk/publications/thesis/online/IM040172.pdf

parallelization of dijkstra's algorithm

Documents