087 pagerank mapreduce pregel

Download 087 Pagerank Mapreduce Pregel

Post on 02-Jun-2018

226 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • 8/11/2019 087 Pagerank Mapreduce Pregel

    1/18

    Where we are

    Bill Howe, UW 1

    1. Graph Tasks

    3. Structural 5. Traversal 6. Patterns

    2. Ex: Histograms

    4. Ex: PageRank

    9-10. Ex: Loops in MR

    7. Pattern Languages

    12. Ex: PageRank in Pregel

    12. Ex: PageRank in MR

    8. Ex: PRISM

    11. Representations

  • 8/11/2019 087 Pagerank Mapreduce Pregel

    2/18

    Big Graphs

    Social scale

    1 billion vertices, 100 billion edges

    Web scale

    50 billion vertices, 1 trillion edges

    Brain scale

    100 billion vertices, 100 trillion edges

    Gerhard et al, frontiers inneuroinformatics, 2011

    Web graph from the SNAP database

    (http://snap.stanford.edu/data)

    Paul Butler, Facebook, 2010

    material adapted fromPaul Burkhardt, Chris Waring

    https://www.facebook.com/notes/facebook-

    engineering/visualizing-friendships/469716398919

  • 8/11/2019 087 Pagerank Mapreduce Pregel

    3/18

    MapReduce for PageRank

    classMapper

    methodMap(id n, vertex N)

    p N.PAGERANK/|N.ADJACENCYLIST|

    EMIT(id n, vertex N)

    for all nodeid m inN.ADJACENCYLIST do

    EMIT(id m, value p)

    classReducer

    methodREDUCE(id m, [p1, p2, ])

    M null, s 0

    for all p in[p1, p2, ] do

    ifISVERTEX(p) then

    M p

    else

    s s + p

    M.PAGERANK s * 0.85 + 0.15 / TOTALVERTICES

    EMIT(id m, vertex M)

    Bill Howe, UW 3

  • 8/11/2019 087 Pagerank Mapreduce Pregel

    4/18

    Problems

    The entire state of the graph is shuffled

    on every iteration

    We only need to shuffle the new rankcontributions, not the graph structure

    Further, we have to control the iteration

    outside of MapReduce

    Bill Howe, UW 4

  • 8/11/2019 087 Pagerank Mapreduce Pregel

    5/18

    Pregel

    Originally from Google

    Open source implementations

    Apache Giraph, Stanford GPS, Jpregel, Hama

    Batch algorithms on large graphs

    Bill Howe, UW 5

    Malewicz et al. SIGMOD 10

    while any vertex is active or max iterations not reached:

    for each vertex:

    process messages from neighbors from previous iterationsend messages to neighbors

    set active flag appropriately

    this loop is run in parallel

  • 8/11/2019 087 Pagerank Mapreduce Pregel

    6/18

    6/17/2013 Bill Howe, Data Science, Autumn 2012 6

    class PageRankVertex: public Vertex {

    public:virtual void Compute(MessageIterator* msgs) {

    if (superstep() >= 1) {

    double sum = 0;

    for (; !msgs->Done(); msgs->Next())

    sum += msgs->Value();

    *MutableValue() = 0.15 / NumVertices() + 0.85 * sum;}

    if (superstep() < 30) {

    const int64 n = GetOutEdgeIterator().size();

    SendMessageToAllNeighbors(GetValue() / n);

    } else {

    VoteToHalt();}

    }

    };

  • 8/11/2019 087 Pagerank Mapreduce Pregel

    7/18

    Bill Howe, UW 7

    0.2

    0.2

    0.2

    0.2

    0.2

    sum = sum(incoming values)

    rank = 0.15 / 5 + 0.85 * sum

  • 8/11/2019 087 Pagerank Mapreduce Pregel

    8/18

    Bill Howe, UW 8

    0.1

    0.1

    0.066

    0.066

    0.066

    0.2

    0.2

    0.2

    0.2

    0.2

    0.2

    0.2

    0.2

  • 8/11/2019 087 Pagerank Mapreduce Pregel

    9/18

    Bill Howe, UW 9

    0.1

    0.1

    0.066

    0.066

    0.066

    0.2

    0.2

    0.172

    0.03

    0.426

    0.34

    0.03

    sum = sum(incoming values)

    rank = 0.15 / 5 + 0.85 * sum

    0.2

  • 8/11/2019 087 Pagerank Mapreduce Pregel

    10/18

    Bill Howe, UW 10

    0.172

    0.03

    0.426

    0.34

    0.03

    sum = sum(incoming values)

    rank = 0.15 / 5 + 0.85 * sum

  • 8/11/2019 087 Pagerank Mapreduce Pregel

    11/18

    Bill Howe, UW 11

    0.172

    0.03

    0.426

    0.34

    0.03

    sum = sum(incoming values)

    rank = 0.15 / 5 + 0.85 * sum

    0.015

    0.015

    0.01

    0.01

    0.01

    0.172

    0.34

    0.426

  • 8/11/2019 087 Pagerank Mapreduce Pregel

    12/18

    Bill Howe, UW 12

    0.0513

    0.03

    0.69

    0.197

    0.03

    sum = sum(incoming values)

    rank = 0.15 / 5 + 0.85 * sum

    0.015

    0.015

    0.01

    0.01

    0.01

    0.172

    0.34

    0.426

  • 8/11/2019 087 Pagerank Mapreduce Pregel

    13/18

    Bill Howe, UW 13

    0.0513

    0.03

    0.69

    0.197

    0.03

    sum = sum(incoming values)

    rank = 0.15 / 5 + 0.85 * sum

  • 8/11/2019 087 Pagerank Mapreduce Pregel

    14/18

    Bill Howe, UW 14

    0.0513

    0.03

    0.69

    0.197

    0.03

    sum = sum(incoming values)

    rank = 0.15 / 5 + 0.85 * sum

    0.015

    0.015

    0.01

    0.01

    0.01

    0.0513

    0.197

    0.69

  • 8/11/2019 087 Pagerank Mapreduce Pregel

    15/18

    Bill Howe, UW 15

    0.0513

    0.03

    0.794

    0.095

    0.03

    sum = sum(incoming values)

    rank = 0.15 / 5 + 0.85 * sum

    0.015 0.01

    0.01

    0.0513

    0.197

    0.69

  • 8/11/2019 087 Pagerank Mapreduce Pregel

    16/18

    Bill Howe, UW 16

    0.0513

    0.03

    0.794

    0.095

    0.03

    sum = sum(incoming values)

    rank = 0.15 / 5 + 0.85 * sum

    0.015 0.01

    0.01

    0.0513

    0.197

    0.69

  • 8/11/2019 087 Pagerank Mapreduce Pregel

    17/18

    Bill Howe, UW 17

    0.0513

    0.03

    0.794

    0.095

    0.03

    sum = sum(incoming values)

    rank = 0.15 / 5 + 0.85 * sum

    0.01

    0.095

    0.794

  • 8/11/2019 087 Pagerank Mapreduce Pregel

    18/18

    Bill Howe, UW 18

    0.0513

    0.03

    0.794

    0.095

    0.03

    sum = sum(incoming values)

    rank = 0.15 / 5 + 0.85 * sum