efficient maximum flow algorithm -...

36
Nikolay Sakharnykh – Hugo Braun; May 10, 2017 EFFICIENT MAXIMUM FLOW ALGORITHM

Upload: others

Post on 25-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

Nikolay Sakharnykh – Hugo Braun; May 10, 2017

EFFICIENT MAXIMUM FLOW ALGORITHM

Page 2: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

2

MAXIMUM FLOWDefinition

s

t

• Directed graph

• Flow capacities on edges

• Maximum flow from s to t?

Example: How much instant power can Palo Alto get using that electric grid?

Power

plant SF

SJ PA

20

30

8

+∞

25

12

1

Page 3: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

3

Applications

MAXIMUM FLOW

Image segmentation Community detection

Page 4: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

4

MAXIMUM FLOW SOLVERS

OTHERPREFLOWAUGMENTING

PATHS

Iteratively find a new augmenting path

• Ford–Fulkerson

• Edmonds–Karp

• Dinic’s/MPM

Linear programmingPush flow locally in a preflow graph

• Push relabel and its variants

Page 5: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

5

AGENDA

Edmonds-Karp

Push-relabel

MPM

Page 6: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

6

FORD FULKERSONWorkflow

while a path p of capacity c > 0 exists from s to t:maxflow += cfor all edges in path p:

edge.capacity -= cadd reverse edge from edge.destination to

edge.source of capacity c

Augmenting path on first iteration, path of capacity min(1,3,1) = 1

s

t

a b

c d

11

3 3

51

1

Page 7: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

7

FORD FULKERSONWorkflow

Augmenting path on first iteration, path of capacity min(1,3,1) = 1

s

t

a b

c d

1

2

1

5

1

1

1

3

while a path p of capacity c > 0 exists from s to t:maxflow += cfor all edges in path p:

edge.capacity -= cadd reverse edge from edge.destination to

edge.source of capacity c

Page 8: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

8

FORD FULKERSONWorkflow

Augmenting path on first iteration, path of capacity min(1,3,1) = 1

s

t

a b

c d

1

2

1

5

1

1

1

3

Augmenting path on second iteration, path of capacity min(1,1,1,3,5) = 1

while a path p of capacity c > 0 exists from s to t:maxflow += cfor all edges in path p:

edge.capacity -= cadd reverse edge from edge.destination to

edge.source of capacity c

Page 9: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

9

EDMONDS-KARP

Edmonds-Karp: variation of Ford Fulkerson

Main idea: use the shortest augmenting path

One augmenting path needs one BFS

Wikipedia graph: ~5000 augmenting paths

s

t

a b

c d

1

2

1

5

1

1

1

3

Ford-Fulkerson & variants use too many graph traversals

Page 10: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

10

PUSH-RELABEL

Page 11: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

11

PUSH-RELABELWorkflow

Saturate all out-edges of s, create reverse edges𝓁[s] = number of vertices𝓁[v] = 0 for all v ∈ V \ {s}while there is an applicable push or relabel operation

execute the operation

push(u, v):if(e[u] > 0 and 𝓁[u] == 𝓁[v] + 1)

push e[u] amount of flow from u to v

relabel(u):if(e[u] > 0 and 𝓁[u] <= 𝓁[v] for all current neighbors)

𝓁[u] = minimum 𝓁[v] among neighbors + 1

s

a b

11

l=?

l=?e=?

l=?e=?

dl=?e=?

1

Page 12: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

12

PUSH-RELABELWorkflow

Saturate all out-edges of s, create reverse edges𝓁[s] = number of vertices𝓁[v] = 0 for all v ∈ V \ {s}while there is an applicable push or relabel operation

execute the operation

push(u, v):if(e[u] > 0 and 𝓁[u] == 𝓁[v] + 1)

push e[u] amount of flow from u to v

relabel(u):if(e[u] > 0 and 𝓁[u] <= 𝓁[v] for all current neighbors)

𝓁[u] = minimum 𝓁[v] among neighbors + 1

s

a b

l=6

l=0e=1

l=0e=1

1 1

dl=0e=0

1

Page 13: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

13

PUSH-RELABELWorkflow

Saturate all out-edges of s, create reverse edges𝓁[s] = number of vertices𝓁[v] = 0 for all v ∈ V \ {s}while there is an applicable push or relabel operation

execute the operation

push(u, v):if(e[u] > 0 and 𝓁[u] == 𝓁[v] + 1)

push e[u] amount of flow from u to v

relabel(u):if(e[u] > 0 and 𝓁[u] <= 𝓁[v] for all current neighbors)

𝓁[u] = minimum 𝓁[v] among neighbors + 1

s

a b

l=6

l=0e=1

l=1e=1

1 1

dl=0e=0

1

Page 14: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

14

PUSH-RELABELWorkflow

Saturate all out-edges of s, create reverse edges𝓁[s] = number of vertices𝓁[v] = 0 for all v ∈ V \ {s}while there is an applicable push or relabel operation

execute the operation

push(u, v):if(e[u] > 0 and 𝓁[u] == 𝓁[v] + 1)

push e[u] amount of flow from u to v

relabel(u):if(e[u] > 0 and 𝓁[u] <= 𝓁[v] for all current neighbors)

𝓁[u] = minimum 𝓁[v] among neighbors + 1

s

a b

l=6

l=0e=1

l=1e=0

1 1

dl=0e=1

1

Page 15: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

15

PUSH-RELABELParallelism issues

while there is an applicable push or relabel operationexecute the operation

s

a b

l=6

l=0e=1

l=1e=0

1 1

dl=0e=1

1

At this step, we could relabel a or d. Which one?

Complexity of heuristics :

PRIORITY LARGEST L SMALLEST L FIFO

Complexity O(V 2√E) O(V 2E) O(V 3)

Source of parallelism:

Order affects convergence. Massive parallelism yields random order

Page 16: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

16

PUSH-RELABELParallelism issues

Parallelism drops. Not enough to saturate the GPU

Source : The University of Texas at Austin

In theory, number of threads = number of vertices

In practice, number of active vertices << number of vertices

Page 17: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

17

PUSH-RELABELConclusion

s

a b

l=6

l=0e=1

l=1e=0

1 1

dl=0e=1

1

• Actual parallelism is low

• Massive parallelism yields random order which damages performance

• We need graph traversals (BFS) for some critical heuristics

Push-relabel not suited for GPU implementation

road_usa: GPU does 20 BFS, CPU does only 3 BFS

CPU is faster since it requires fewer traversals

Page 18: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

18

MPM

Page 19: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

19

DINIC’SWorkflow

s

t

a b

c d

21

3 3

21

1

Two augmenting paths of length 3

They have been discovered using just one BFS

Avoid running BFS twice here

Main idea of Dinic’s: reuse BFS results

Edges on paths of length 3

Page 20: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

20

DINIC’SWorkflow

s

t

a b

c d

21

3 3

21

1

While s is still connected to t in G, doCreate layer graph GL containing only shortest paths from s to t

While s is still connected to t, doFind augmenting path from s to t in GLPush corresponding flow in GL, update edges

Graph G

Page 21: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

21

DINIC’SWorkflow

s

t

a b

c d

2

3 3

21

While s is still connected to t in G, doCreate layer graph GL containing only shortest paths from s to t

While s is still connected to t, doFind augmenting path from s to t in GLPush corresponding flow in GL, update edges

Graph GL(3)

Page 22: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

22

DINIC’SWorkflow

s

t

a b

c d

2 2

1

While s is still connected to t in G, doCreate layer graph GL containing only shortest paths from s to t

While s is still connected to t, doFind augmenting path from s to t in GLPush corresponding flow in GL, update edges

Graph GL(3)

1

1

1

2

1

DFS

Page 23: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

23

DINIC’SWorkflow

s

t

a b

c d

2

1

While s is still connected to t in G, doCreate layer graph GL containing only shortest paths from s to t

While s is still connected to t, doFind augmenting path from s to t in GLPush corresponding flow in GL, update edges

Graph GL(4)

1

1

1

Page 24: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

24

DINIC’SWorkflow

s

t

a b

c d

1

While s is still connected to t in G, doCreate layer graph GL containing only shortest paths from s to t

While s is still connected to t, doFind augmenting path from s to t in GLPush corresponding flow in GL, update edges

Graph GL(4)

1

11

1

1

DFS

DFS traverse all vertices on GPUWe lose all advantages of Dinic’s

Page 25: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

25

MPMWorkflow

s

t

a b

c d

21

3 3

21

1

While s is still connected to t in G, doCreate layer graph GL containing only shortest paths from s to t

While s is still connected to t, doFind vertex u with minimum potential m, withpotential(u) = min(degreein(u), degreeout(u))

push m from u to t, pull m from s to uremove all vertex with min(degreein(u), degreeout(u))=0

Graph G

Page 26: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

26

MPMWorkflow

s

t

b

c d

2

3 3

21

While s is still connected to t in G, doCreate layer graph GL containing only shortest paths from s to t

While s is still connected to t, doFind vertex u with minimum potential m, withpotential(u) = min(degreein(u), degreeout(u))

push m from u to t, pull m from s to uremove all vertex with min(degreein(u), degreeout(u))=0

Graph G

Page 27: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

27

MPMWorkflow

s

t

b

c d

2

3 3

21

While s is still connected to t in G, doCreate layer graph GL containing only shortest paths from s to t

While s is still connected to t, doFind vertex u with minimum potential m, withpotential(u) = min(degreein(u), degreeout(u))

push m from u to t, pull m from s to uremove all vertex with min(degreein(u), degreeout(u))=0

Graph GL(3)

c is selected so that we know 1 amount of flow will pass through

Page 28: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

28

MPMWorkflow

s

t

b

c d

1

2 3

2

While s is still connected to t in G, doCreate layer graph GL containing only shortest paths from s to t

While s is still connected to t, doFind vertex u with minimum potential m, withpotential(u) = min(degreein(u), degreeout(u))

push m from u to t, pull m from s to uremove all vertex with min(degreein(u), degreeout(u))=0

Graph GL(3)

c is selected so that we know 1 amount of flow will pass through

Page 29: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

29

MPMWorkflow

s

t

b

d

1

3

2

While s is still connected to t in G, doCreate layer graph GL containing only shortest paths from s to t

While s is still connected to t, doFind vertex u with minimum potential m, withpotential(u) = min(degreein(u), degreeout(u))

push m from u to t, pull m from s to uremove all vertex with min(degreein(u), degreeout(u))=0

Graph GL(3)

Page 30: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

30

MPMDinic’s vs MPM

s

t

a

h

Dinic’s - DFS

e f

b

35

7

3 23

1

g

2

Processed but useless edges

Processed and acceptable edges

s

t

h

7

3 1

c c

MPM – Push/Pull/Prune

Across the graph, min potential = 1 (vertex h)Pushing 1 to t, pulling 1 from s, using any edges

d

3

4

d

3

4

Page 31: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

31

MPMDinic’s vs MPM

s

t

h

7

3 1

cd

3

4

Saturating one augmenting path on GPU:

• MPM: Push/pull/prune process 30us• Edmonds-Karp/Dinic’s: one BFS >1ms

Perf. bounded by kernel launch latency

Example: Wikipedia 2011• MPM: 5 BFS, 6000 augmenting paths• EK: 6000 BFS

Page 32: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

32

MPMGPU design

s

t

h

7

3 1

cd

3

4

MPM paper gives a high level implementation

Most of the work went into GPU implementation design (2 out of 3 months)

Page 33: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

33

MAXIMUM FLOW RESULTS

GRAPH N NNZSPEED UP

AVG MIN MAX

wiki03 455436 3811198 9.1 1.7 15.3

wiki11 3721339 121043107 22.5 19.8 28.9

road_usa 23947347 57708624 2 0.7 4.2

road CA 1971281 5533214 2.3 0.8 4.9

Galois on dual socket Haswell 16 cores vs NVIDIA Titan X (Pascal)

Page 34: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

34

EFFICIENT MAXIMUM FLOW ALGORITHM

• Black-box solver: large variety of applications can be seen as the flow problem.

• Data-dependent, irregular algorithm: how to create enough “real” parallelism and how to avoid latency issues on the GPU.

• Order of magnitude speed-ups on wide graphs. Long graphs require a more efficient graph traversal implementation.

Takeaways

Page 35: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL

35

REFERENCES

An Experimental Comparison of Min• -Cut/Max-Flow Algorithms for Energy Minimization in Vision, Yuri Boykov, Vladimir Kolmogorov

Finding Web Communities by Maximum Flow Algorithm using Well• -Assigned Edge Capacities, Noriko IMAFUJI, and Masaru KITSUREGAWA

An O(|• V|3) algorithm for finding maximum flows in networks, V.M. Malhotra, M.Pramodh Kumar, S.N. Maheshwari

Parallizing the Push• -Relabel Max Flow Algorithm, Victoria Popic, Javier Velez

Page 36: EFFICIENT MAXIMUM FLOW ALGORITHM - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/S7370... · 2017-05-16 · Ford-Fulkerson & variants use too many graph traversals. 10 PUSH-RELABEL