single-pass graph stream analytics with apache flink

95
@GraphDevroom Single-pass Graph Stream Analytics with Apache Flink Rethinking graph processing for dynamic data Vasiliki Kalavri <[email protected]> Paris Carbone <[email protected]> 1

Upload: paris-carbone

Post on 16-Apr-2017

105 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Single-pass Graph Stream Analytics with Apache Flink

Rethinking graph processing for dynamic data

Vasiliki Kalavri <[email protected]> Paris Carbone <[email protected]>

1

Page 2: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Real Graphs are dynamic

Graphs created by events happening in real-time • liking a post • buying a book • listening to a song • rating a movie • packet switching in computer networks • bitcoin transactions

Each event adds an edge to the graph

2

Page 3: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

3

Page 4: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

In a batch world

We create and analyze a snapshot of the real graph • all events / interactions / relationships that

happened between t0 and tn • the Facebook social network on January 30 2016 • user web logs gathered between March 1st 12:00 and 16:00 • retweets and replies for 24h after the announcement of the

death of David Bowie

4

Page 5: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Batch Graph Processing

5

Page 6: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

In a streaming world

• We receive and consume the events as they are happening, in real-time

• We analyze the evolving graph and receive results continuously

6

Page 7: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

7

Streaming Graph Processing

Page 8: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

8

Streaming Graph Processing

Page 9: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

9

Streaming Graph Processing

Page 10: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

10

Streaming Graph Processing

Page 11: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

11

Streaming Graph Processing

Page 12: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

12

Streaming Graph Processing

Page 13: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

13

Streaming Graph Processing

Page 14: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

14

Streaming Graph Processing

Page 15: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

15

Streaming Graph Processing

Page 16: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

16

Streaming Graph Processing

Page 17: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

17

Streaming Graph Processing

Page 18: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Sounds expensive?

Challenges • maintain the graph structure

• how to apply state updates efficiently?

• update the result • re-run the analysis for each event? • design an incremental algorithm? • run separate instances on multiple snapshots?

• compute only on most recent events

18

Page 19: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

19

The Apache Flink Stack

APIs

Execution

DataStreamDataSet

Distributed Dataflow

Deployment

• Bounded Data Sources • Structured Iterations • Blocking Operations

• Unbounded Data Sources • Asynchronous Iterations • Incremental Operations

Page 20: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Unifying Data Processing

Job Manager • scheduling tasks • monitoring/recovery

Client

• task pipelining • blocking

• execution plan building • optimisation

20

DataStreamDataSet

Distributed Dataflow

Deployment

HDFS

Kafka

DataSet<String> text = env.readTextFile(“hdfs://…”); text.map(…).groupReduce(…)…

DataStream<String> events = env.addSource(new KafkaConsumer(…)); events.map(…).filter(…).window(…).fold(…)…

Page 21: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom Graph Processing on

Apache Flink

21

DataStreamDataSet

Distributed Dataflow

Deployment

Gelly

• Static Graphs • Multi-Pass Algorithms • Full Computations

DataStream

Page 22: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Data Streams as ADTs

22

• Direct access to the execution graph / topology

• Suitable for engineers

• Abstract Data Type Transformations hide operator details

• Suitable data analysts and engineers

similar to: PCollection, DStream

DataStream

Page 23: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Nature of a DataStream Job

23

• Tasks are long running in a pipelined execution.

• State is kept within tasks.

• Transformations are applied per-record or per-window.

Execution Graph

unbounded data sinks

unbounded data sources

• operator parallelism• stream partitioning

Execution Properties

Page 24: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Working with DataStreams

24

Creation TransformationsDataStream<String> myStream =

-for supported data sources: env.addSource(new FlinkKafkaConsumer<String>(…)); env.addSource(new RMQSource<String>(…)); env.addSource(new TwitterSource(propsFile)); env.socketTextStream(…);

-for testing: env.fromCollection(…); env.fromElements(…);

-for adding any custom source: env.addSource(MyCustomSource(…));

PropertiesmyStream.setParallelism(3)

myStream.broadcast(); .rebalance(); .forward();

.keyBy(key);

partitioning

partition stream and operator state by key

myStream.map(…); myStream.flatMap(…); myStream.filter(…); myStream.union(myOtherStream);

-for aggregations on partitioned-by-key streams:

myKeyStream.reduce(…); myKeyStream.fold(…); myKeyStream.sum(…);

Page 25: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Example

25

env.setParallelism(2); //default parallelism DataStream<Tuple2<String, Integer>> counts = env

.socketTextStream("localhost", 9999) .flatMap(new Splitter()) //transformation .keyBy(0) //partitioning .sum(1) //rolling aggregation

.setParallelism(4); counts.print();

“cool, gelly is cool”

<“gelly", 1><“is”, 1> <“cool”,1><“cool”,1>

<“is”, 1> <“gelly”, 1>

<“cool”,2> <“cool”,1>

printsum

flatMap

Page 26: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Working with Windows

26

Why windows? We are often interested in fresh data!

Highlight: Flink can form and trigger windows consistently under different notions of time and deal with late events!

#sec40 80

SUM #2

0

SUM #1

20 60 100

#sec40 80

SUM #3

SUM #2

0

SUM #1

20 60 100

120

15 38 65 88

15 38

38 65

65 88

15 38 65 88

110 120

myKeyStream.timeWindow( Time.of(60, TimeUnit.SECONDS), Time.of(20, TimeUnit.SECONDS));

1) Sliding windows

2) Tumbling windowsmyKeyStream.timeWindow( Time.of(60, TimeUnit.SECONDS));

window buckets/panes

Page 27: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Example

27

env.setParallelism(2); //default parallelism DataStream<Tuple2<String, Integer>> counts = env

.socketTextStream("localhost", 9999) .flatMap(new Splitter()) //transformation .keyBy(0) //partitioning

.window(Time.of(5, TimeUnit.MINUTES)) .sum(1) //rolling aggregation

.setParallelism(4); counts.print();

10:48 - “cool, gelly is cool”

printwindow sumflatMap

11:01 - “dataflow is cool too”

<“gelly”,1>… <“cool”,2>

<“dataflow”,1>… <“cool”,1>

Page 28: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom Single-Pass Graph Streaming

with Windows• Each event represents an edge addition

• Each edge is processed once and thrown away, i.e. the graph structure is not explicitly maintained

• The state maintained corresponds to a graph summary, a continuously improving property, an aggregation

• Recent events can be grouped in a graph window and processed independently

28

Page 29: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

What’s the benefit?

• Get results faster • No need to wait for the job to finish • Sometimes, early approximations are better than late exact

answers • Get results continuously

• Process unbounded number of events • Use less memory

• single-pass algorithms don’t store the graph structure • run computations on a graph summary

29

Page 30: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

What can you do in this model?

• transformations, e.g. mapping, filtering vertex / edge values, reverse edge direction

• continuous aggregations, e.g. degree distribution

30

Page 31: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

What can you do in this model?

• transformations, e.g. mapping, filtering vertex / edge values, reverse edge direction

• continuous aggregations, e.g. degree distribution

31

Page 32: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

What can you do in this model?

• transformations, e.g. mapping, filtering vertex / edge values, reverse edge direction

• continuous aggregations, e.g. degree distribution

32

Page 33: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

What can you do in this model?

• transformations, e.g. mapping, filtering vertex / edge values, reverse edge direction

• continuous aggregations, e.g. degree distribution

33

Page 34: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

What can you do in this model?

• transformations, e.g. mapping, filtering vertex / edge values, reverse edge direction

• continuous aggregations, e.g. degree distribution

34

Page 35: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

What can you do in this model?

• transformations, e.g. mapping, filtering vertex / edge values, reverse edge direction

• continuous aggregations, e.g. degree distribution

35

Page 36: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

What can you do in this model?

• transformations, e.g. mapping, filtering vertex / edge values, reverse edge direction

• continuous aggregations, e.g. degree distribution

36

Page 37: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

1

43

2

5

6

7

8

0

2

4

6

1 2 3 4

Streaming Degrees Distribution#v

ertic

es

degree

37

Page 38: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

1

43

2

5

6

7

8

0

2

4

6

1 2 3 4

#ver

tices

degree

Streaming Degrees Distribution

38

Page 39: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

1

43

2

5

6

7

8

0

2

4

6

1 2 3 4

#ver

tices

degree

Streaming Degrees Distribution

39

Page 40: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

1

43

2

5

6

7

8

0

2

4

6

1 2 3 4

#ver

tices

degree

Streaming Degrees Distribution

40

Page 41: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

1

43

2

5

6

7

8

Streaming Degrees Distribution

0

2

4

6

1 2 3 4

#ver

tices

degree

41

Page 42: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

1

43

2

5

6

7

8

0

2

4

6

1 2 3 4

#ver

tices

degree

Streaming Degrees Distribution

42

Page 43: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

1

43

2

5

6

7

8

Streaming Degrees Distribution

0

2

4

6

1 2 3 4

#ver

tices

degree

43

Page 44: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

1

43

2

5

6

7

8

0

2

4

6

1 2 3 4

#ver

tices

degree

Streaming Degrees Distribution

44

Page 45: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

1

43

2

5

6

7

8

Streaming Degrees Distribution

0

2

4

6

1 2 3 4

#ver

tices

degree

45

Page 46: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

1

43

2

5

6

7

8

0

2

4

6

1 2 3 4

#ver

tices

degree

Streaming Degrees Distribution

46

Page 47: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

What can you do in this model?

• spanners for distance estimation • sparsifiers for cut estimation • sketches for homomorphic properties

graph summary

algorithm algorithm~R1 R2

47

Page 48: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

What can you do in this model?

• neighborhood aggregations on windows, e.g. triangle counting, clustering coefficient (no iterations… yet!)

48

Page 49: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Examples

49

Page 50: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Batch Connected Components

• State: the graph and a component ID per vertex (initially equal to vertex ID)

• Iterative Computation: For each vertex:

• choose the min of neighbors’ component IDs and own component ID as new ID

• if component ID changed since last iteration, notify neighbors

50

Page 51: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

1

43

2

5

6

7

8

i=0

Batch Connected Components

51

Page 52: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

1

43

2

5

6

7

8

i=13 4

1 4

4 5

2 4

1 2 4 5

7 8

6 8

6 7

1 1

2

6

6

Batch Connected Components

52

Page 53: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

1

11

2

2

6

6

6

i=2

1

1

1 2

1 2 6

6

6

1

1

Batch Connected Components

53

Page 54: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

1

11

1

1

6

6

6

i=3

Batch Connected Components

54

Page 55: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Streaming Connected Components

• State: a disjoint set data structure for the components

• Computation: For each edge

• if seen for the 1st time, create a component with ID the min of the vertex IDs

• if in different components, merge them and update the component ID to the min of the component IDs

• if only one of the endpoints belongs to a component, add the other one to the same component

55

Page 56: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

31

52

54

76

86

ComponentID Vertices

1

43

2

5

6

7

8

56

Page 57: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

31

52

54

76

86

42

ComponentID Vertices

1 1, 3

1

43

2

5

6

7

8

57

Page 58: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

31

52

54

76

86

42

ComponentID Vertices

43

2 2, 5

1 1, 3

1

43

2

5

6

7

8

58

Page 59: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

31

52

54

76

86

42

43

87

ComponentID Vertices

2 2, 4, 5

1 1, 3

1

43

2

5

6

7

8

59

Page 60: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

31

52

54

76

86

42

43

87

41

ComponentID Vertices

2 2, 4, 5

1 1, 3

6 6, 7

1

43

2

5

6

7

8

60

Page 61: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

52

54

76

86

42

43

87

41

ComponentID Vertices

2 2, 4, 5

1 1, 3

6 6, 7, 8

1

43

2

5

6

7

8

61

Page 62: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

54

76

86

42

43

87

41 ComponentID Vertices

2 2, 4, 5

1 1, 3

6 6, 7, 8

1

43

2

5

6

7

8

62

Page 63: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

76

86

42

43

87

41

ComponentID Vertices

2 2, 4, 5

1 1, 3

6 6, 7, 8

1

43

2

5

6

7

8

63

Page 64: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

76

86

42

43

87

41

ComponentID Vertices

6 6, 7, 8

1 1, 2, 3, 4, 5

1

43

2

5

6

7

8

64

Page 65: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

86

42

43

87

41

ComponentID Vertices

6 6, 7, 8

1 1, 2, 3, 4, 5

1

43

2

5

6

7

8

65

Page 66: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

42

43

87

41

ComponentID Vertices

6 6, 7, 8

1 1, 2, 3, 4, 5

1

43

2

5

6

7

8

66

Page 67: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom Distributed Streaming Connected

Components

67

Page 68: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Streaming Bipartite Detection

Similar to connected components, but

• each vertex is also assigned a sign, (+) or (-)

• edge endpoints must have different signs

• when merging components, if flipping all signs doesn’t work => the graph is not bipartite

68

Page 69: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

1

43

2

5

6

7

(+) (-)

(+)(-)

(+) (-)

(+)

Cid=1

Cid=5

Streaming Bipartite Detection

69

Page 70: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

3 5

1

43

2

5

6

7

(+) (-)

(+)(-)

(+) (-)

(+)

Cid=1

Cid=5

Streaming Bipartite Detection

70

Page 71: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

3 5

1

43

2

5

6

7

(+) (-)

(+)(-)

(+) (-)

(+)

Cid=1

Cid=5

Streaming Bipartite Detection

71

Page 72: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Cid=1

1

43

2

5

6

7

(+) (-)

(-)(+)

(+) (-)

(-)

3 5

Streaming Bipartite Detection

72

Page 73: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

3 7

Cid=1

1

43

2

5

6

7

(+) (-)

(-)(+)

(+) (-)

(-)Can’t flip signs and stay consistent

=> not bipartite!

Streaming Bipartite Detection

73

Page 74: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

The GraphStream

74

DataStreamDataSet

Distributed Dataflow

Deployment

Gelly Gelly-Stream

• Static Graphs • Multi-Pass Algorithms • Full Computations

• Dynamic Graphs • Single-Pass Algorithms • Incremental Computations

DataStream

Page 75: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Introducing Gelly-Stream

75

• Gelly-Stream enriches the DataStream API with two new additional ADTs:

• GraphStream:

• A representation of a data stream of edges.

• Edges can have state (e.g. weights).

• Supports property streams, transformations and aggregations.

• GraphWindow:

• A “time-slice” of a graph stream.

• It enables neighborhood aggregations (and iterations in the future)

Page 76: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Graph Property Streams

76

AB

C D

A B C D A CGraph Stream:

.getEdges()

.getVertices()

.numberOfVertices()

.numberOfEdges()

.getDegrees()

.inDegrees()

.outDegrees()

GraphStream -> DataStream

Page 77: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

.mapEdges();

.distinct();

.filterVertices();

.filterEdges();

.reverse();

.undirected();

.union();

Transform Graph Streams

77

AB

C D

A B C D A CGraph Stream:

GraphStream -> GraphStream

Page 78: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Graph Stream Aggregations

78

result aggregate

property streamgraph stream

(window) fold

combine

fold

reduce

partitioned aggregates

global aggregates

edges

agg

global aggregates can be persistent or transient

graphStream.aggregate(new MyGraphAggregation(window, update, fold, combine, merge))

Page 79: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Graph Stream Aggregations

79

result aggregate

property streamgraph stream

(window) fold

combine merge

graphStream.aggregate(new MyGraphAggregation(window, fold, combine, merge))

fold

reduce map

partitioned aggregates

global aggregates

edges

agg

Page 80: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Connected Components

80

graph stream

combine merge

graphStream.aggregate(new ConnectedComponents(window, update, fold, combine, merge))

reduce map31

52

1

43

2

5

6

7

8

Page 81: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Connected Components

81

graph stream

combine mergereduce map

{1,3}

{2,5}

graphStream.aggregate(new ConnectedComponents(window, update, fold, combine, merge)) 1

43

2

5

6

7

8

Page 82: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Connected Components

82

graph stream

combine mergereduce map

{1,3}

{2,5}

54

graphStream.aggregate(new ConnectedComponents(window, update, fold, combine, merge)) 1

43

2

5

6

7

8

Page 83: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Connected Components

83

graph stream

combine mergereduce map

{1,3}

{2,5}

{4,5}76

86

graphStream.aggregate(new ConnectedComponents(window, update, fold, combine, merge)) 1

43

2

5

6

7

8

Page 84: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Connected Components

84

graph stream

combine mergereduce map

{1,3}

{2,5}

{4,5}

{6,7}

{6,8}

graphStream.aggregate(new ConnectedComponents(window, update, fold, combine, merge)) 1

43

2

5

6

7

8

Page 85: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Connected Components

85

graph stream

combine mergereduce map

TODO:: show blocking reduce instead?

{2,5}{6,8}

{1,3}{4,5}

{6,7}

3

graphStream.aggregate(new ConnectedComponents(window, update, fold, combine, merge)) 1

43

2

5

6

7

8

Page 86: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Connected Components

86

graph stream

combine mergereduce map

{1,3}{2,4,5}

{6,7,8}

3

graphStream.aggregate(new ConnectedComponents(window, update, fold, combine, merge)) 1

43

2

5

6

7

8

Page 87: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Connected Components

87

graph stream

combine mergereduce map

{1,3}{2,4,5}

{6,7,8}

342

43

graphStream.aggregate(new ConnectedComponents(window, update, fold, combine, merge)) 1

43

2

5

6

7

8

Page 88: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Connected Components

88

graph stream

combine mergereduce map

{1,3}{2,4,5}

{6,7,8}

3

{2,4}

{3,4}

41

87

graphStream.aggregate(new ConnectedComponents(window, update, fold, combine, merge)) 1

43

2

5

6

7

8

Page 89: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Connected Components

89

graph stream

combine mergereduce map

{1,3}{2,4,5}

{6,7,8}

3

{1,2,4}

{3,4}{7,8}

graphStream.aggregate(new ConnectedComponents(window, update, fold, combine, merge)) 1

43

2

5

6

7

8

Page 90: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Connected Components

90

graph stream

combine mergereduce map

{1,2,4,5}{6,7,8}

2

{3,4}{7,8}

graphStream.aggregate(new ConnectedComponents(window, update, fold, combine, merge)) 1

43

2

5

6

7

8

Page 91: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Connected Components

91

graph stream

combine mergereduce map

{1,2,3,4,5}{6,7,8}

2

graphStream.aggregate(new ConnectedComponents(window, update, fold, combine, merge)) 1

43

2

5

6

7

8

Page 92: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Slicing Graph Streams

92

graphStream.slice(Time.of(1, MINUTE));

11:40 11:41 11:42 11:43

Page 93: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Aggregating Slices

93

graphStream.slice(Time.of(1, MINUTE), direction)

.reduceOnEdges();

.foldNeighbors();

.applyOnNeighbors();

• Slicing collocates edges by vertex information

• Neighbourhood aggregations are now enabled on sliced graphs

source

target

Aggregations

Page 94: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Finding matches nearby

94

graphStream.slice(Time.of(1, MINUTE)).applyOnNeighbors(FindPairs())

slice applyOnNeighbors

TODO: make it more interactive with transitions

Page 95: Single-Pass Graph Stream Analytics with Apache Flink

@GraphDevroom

Summary

• Many graph analysis problems can be covered in single-pass

• Processing dynamic graphs requires an incremental graph processing model

• We introduce Gelly-Stream, a simple yet powerful library for graph streams