parallelization of dijkstra's algorithm

14
Programming project Dijkstra's algorithm Summer school : ' Introduction to High performance Computing’ Center for High-Performance Computing & KTH School of Computer Science and Communication Salvator Gkalea Dionysios Zelios Department ICT/ES Department of Physics KTH Stockholm University [email protected] [email protected] Tutor: Stefano Markidis Date: October 2014

Upload: dionysios-zelios

Post on 19-Dec-2015

92 views

Category:

Documents


4 download

DESCRIPTION

Dijkstra's algorithm is a graph search algorithm that solves the single-source shortest path problem for a graph with non-negative edge path costs, producing a shortest path tree. This algorithm is often used in routing and as a subroutine in other graph algorithms. In this project we investigate the parallelization of this algorithm and its speedup against the sequential one. Message Passing Interface (MPI) is used for this purpose, where each processor has its own private memory and can only access its own memory.This work has been done in cooperation with Salvator Gkalea, FPGA Hardware Engineer.

TRANSCRIPT

  • Programming project Dijkstra's algorithm

    Summer school: ' Introduction to High performance Computing

    Center for High-Performance Computing & KTH School of Computer

    Science and Communication

    Salvator Gkalea Dionysios Zelios

    Department ICT/ES Department of Physics

    KTH Stockholm University

    [email protected] [email protected]

    Tutor: Stefano Markidis

    Date: October 2014

  • Contents Abstract ..................................................................................................................................... 3

    Introduction ............................................................................................................................... 3

    Dijkstra's Parallelization ............................................................................................................ 5

    Simulation results and analysis ................................................................................................. 8

    References ............................................................................................................................... 14

  • Abstract Dijkstra's algorithm is a graph search algorithm that solves the single-

    source shortest path problem for a graph with non-negative edge path

    costs, producing a shortest path tree. This algorithm is often used in

    routing and as a subroutine in other graph algorithms. In this project we

    investigate the parallelization of this algorithm and its speedup against

    the sequential one. Message Passing Interface (MPI) is used for this

    purpose, in order to parallelize the source-shortest path algorithm.

    Introduction Dijkstras algorithm determines the shortest path from a single source

    vertex to every other vertex. It can also be used for finding costs of

    shortest paths from a single vertex to a single destination vertex by

    stopping the algorithm once the shortest path to the destination vertex

    has been determined. For instance, if the vertices of the graph represent

    cities and edge path costs represent driving distances between pairs of

    cities connected by a direct road, Dijkstra's algorithm can be used to find

    the shortest route between one city and all other cities.

    A pseudo-code representation of the Dijkstra's sequential single-source

    shortest-paths Algorithm is given. The pseudo-algorithm proceeds as

    follows:

  • 1. function Dijkstra(V, E, w, s)

    2. begin

    3. VT := {s};

    4. for all v exists in (V - VT) do

    5. if (s, v) exists

    set l[v] := w(s, v);

    6. else

    set l[v] := infinitive;

    end for

    7. while VT V do

    8. begin

    9. find a vertex u such that l[u] :=

    min{l[v]|v exists in (V - VT)};

    10. VT := VT joint {u};

    11. for all v exists in (V - VT) do

    12. l[v] := min{l[v], l[u] + w(u, v)};

    13. end while

    14. end Dijkstra

    Suppose we have a given weighted graph G = (V,E,w), where V =

    vertices, E = edges, w = weight. The pseudo-code above represents the

    procedure where the shortest path, from vertex s to every other vertex

    v, is computed. For every vertex v that exists in (V - VT), the algorithm

    stores the minimum cost to reach vertex v from vertex s. In line 6-7 the

    procedure stores every weight of each vertex to the l[] array. In the rest

    of the lines, the algorithm finds the closest vertex u to vertex s , based

    on the weight, among all those not yet examined and then updates the

    minimum distance ( l[] array ).

    The main parallelization idea to approach this problem is to

    assume that every process can discover its node that is closest to the

    source. Then, it is supposed to make a reduction in order to distinguish

    the closest node and upon success it will broadcast the selection of this

    node to all the other processes. At the end each process will update

    locally its distant array ( l[] array ).

    The parallel computing can be achieved with some basic MPI communication functions. These functions are described below and they are going to be used to parallelize the single-source shortest-paths Algorithm.

  • MPI_Init Initialize the parallel environment

    MPI_Finalize End the parallel environment, release system resources

    MPI_Comm_size Returns the number of parallel computing units

    MPI_Comm_rank Returns the current calculation unit identification nimber

    MPI_Reduce Reduces values on all processes to a single value

    MPI_Bcast Broadcasts a message from the process with rank "root" to all other processes of the communicator

    MPI_Gather Gathers together values from a group of processes

    Dijkstra's Parallelization The pseudo-code mentioned above can be parallelized based on the line 9 in which the minimum distance among all the nodes is computed or based on the line 11 where the algorithm calculates the minimum overall distance and stores it to the local minimum distance array. The classic approach is to break down the data that need to be computed into segments. Then each node is responsible to process and compute a segment of data. At the end all the results from all the segments must be gathered at a node. The C code above demonstrates the logic mentioned above. #include

    #include

    int numV, //number of vertices

    *todo, //vertices to be analysed

    numNodes, //number of nodes

    chunk, //number of vertices handled by every node

    start, end, //start, end point vertex for each node

    myNode; //ID of node

    unsigned maxInt,

    localMin[2], // [0]: min for local node, [1]: vertex

    for that min

  • globalMin[2], // [0]: global min for all the node,

    [1]: vertex for that min

    *graph, //hops between vertices

    *minDistance; // min distance

    double T1, T2; // start, end time

    // Generation of the Graph

    void init(int ac, char **av) {

    int i, j;

    unsigned u, tmp;

    numV = atoi(av[1]);

    MPI_Init(&ac, &av);

    MPI_Comm_size(MPI_COMM_WORLD, &numNodes);

    MPI_Comm_rank(MPI_COMM_WORLD, &myNode);

    chunk = numV / numNodes;

    start = myNode * chunk;

    end = start + chunk - 1;

    maxInt = -1 >> 1;

    graph = malloc(chunk * numV * sizeof(int));

    minDistance = malloc(numV * sizeof(int));

    todo = malloc(numV * sizeof(int));

    for (i = 0; i < chunk; i++) {

    for (j = i; j < numV; j++) {

    if (j == i)

    graph[i][j] = 0;

    else {

    graph[i][j] = rand(1) % 21 + 1;

    graph[j][i] = graph[i][j];

    }

    }

    }

    for (i = 0; i < numV; i++) {

    todo[i] = 1;

    minDistance[i] = maxInt;

    }

    minDistance[0] = 0;

    }

    int main(int ac, char **av) {

    int i,j,k;

    //Initialization process

    init(ac, av);

    if (myNode == 0)

    T1 = MPI_Wtime();

    for (step = 0; step < numV; step++) {

    // Find local minimum distance

  • for (i = start; i
  • Simulation results and analysis The experiments have been conducted on Milner with the Intel

    compiler.

    Time measurements

    First, we run our algorithm for different values of meshes. Each time we vary the number of cores and we get as a result the time needed for the algorithm to run. The measurements are shown in the table below and were executed as aprun n Nodes ./Dijkstra.out mesh, where Nodes=132 and mesh=the size of the graph (ex. Mesh=4*106 means the number of vertices) mesh= 4*106, Nodes=1..32 Nodes 1 2 4 8 16 32

    Time (sec)

    0.009083

    0.010213

    0.007118

    0.008694

    0.015567

    0.034436

    mesh= 64*106 Nodes 1 2 4 8 16 32

    Time (sec)

    0.14132 0.150809

    0.092254

    0.062077 0.105193

    0.1638634

    mesh= 256*106 Nodes 1 2 4 8 16 32

    Time (sec)

    0.564806

    0.597349

    0.352130

    0.202681

    0.189242

    0.200911

    mesh=900*106 Nodes 1 2 4 8 16 32

    Time (sec)

    1.984156

    2.077290

    1.213774

    0.874406

    0.85756

    1.056482

  • Plotting now the aforementioned values in the same diagram, we have:

    It is obvious that the running time is decreasing as the number of cores increases until it reaches the smallest value, then the running time will increase because of the communication latency.

    However, for middle size network the phenomenon of a reducing running time is not that obvious. For a small size network, the running

  • time is even increasing as the number of cores increases, because the communication latency outperforms the benefit from using more cores.

    Speed up measurements

    In our next step, we want to investigate the speed up of our algorithm. In order to do that, we divided the base time in one node with time needed for multiple nodes to be executed. The measurements and the corresponding plot are given below: mesh= 4*106 Nodes 1 2 4 8 16 32

    Time (sec)

    1

    0.889356 1.276060 1.044743 0.583477 0.263764

    mesh= 64*106 Nodes 1 2 4 8 16 32

    Time (sec)

    1 0,9370793

    1,531857

    2,27652 1,3434353 0,862425

    mesh= 256*106 Nodes 1 2 4 8 16 32

    Time (sec)

    1

    0,945520 1,60397 2,78667 2,98457 2,81122

    mesh=900*106 Nodes 1 2 4 8 16 32

    Time (sec)

    1

    0,9551 1,63469 2,269147 2,3137 1,8780

  • The speed up is increasing as the number of cores increases until it reaches the maximum value, then the speed up is decreasing.

    This is happening because more cores are used. The speed up is decreasing because the communication latency outperforms the benefit from using more cores.

    As the network size increases, the number of cores used to get the maximum speed up increases.

  • Cost measurements

    In addition, we would like to investigate the cost in comparison with the

    number of cores that we have used. In order to do that, we multiplied

    the number of cores with the execution time in one core. The

    measurements and the corresponding diagram are presented below:

    mesh= 4*106 Nodes 1 2 4 8 16 32

    Time (sec)

    1,87807 0,020426 0,028472 0,069552 0,249072 1,101952

    mesh= 64*106 Nodes 1 2 4 8 16 32

    Time (sec)

    0,14132 0,301618 0,369016 0,496616 1,683088 5,2436288

    mesh= 256*106 Nodes 1 2 4 8 16 32

    Time (sec)

    0,564806 1,194698 1,40852 1,621448 3,027872 6,429152

    mesh=900*106 Nodes 1 2 4 8 16 32

    Time (sec)

    1,984156 4,15458 4,855096 6,995248 13,72096 36,97687

  • The cost is increasing because the speed up (or the benefit of a reduced running time) cannot outperform the cost of using more cores.

  • References

    1) Wikipedia: http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm

    2) Lecture notes by Erwin Laure:

    http://agenda.albanova.se/conferenceDisplay.py?confId=4384

    3)Han, Xiao Gang, Qin Lei Sun, and Jiang Wei Fan. "Parallel Dijkstra's

    Algorithm Based on Multi-Core and MPI." Applied Mechanics and

    Materials 441 (2014): 750-753.

    4) http://www.mpich.org/

    5) http://www.open-mpi.org/

    6) Crauser, Andreas, et al. "A parallelization of Dijkstra's shortest

    path algorithm."Mathematical Foundations of Computer Science

    1998. Springer Berlin Heidelberg, 1998. 722-731

    7) Meyer, Ulrich, and Peter Sanders. "-stepping: A parallel single

    source shortest path algorithm." AlgorithmsESA98. Springer

    Berlin Heidelberg, 1998. 393-404.

    8)http://www.inf.ed.ac.uk/publications/thesis/online/IM040172.pdf