external memory graph algorithms and applications to gis laura toma duke university july 14 2003

75
External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Post on 21-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

External Memory Graph Algorithms and

Applications to GIS

Laura Toma

Duke University

July 14 2003

Page 2: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Massive Data

• Massive datasets are being collected everywhere• Storage management software is billion-$ industry

Examples: Geography: NASA satellites generate

1.2TB per day

WEB: Web crawl of 200M pages and 2000M links, Akamai stores 7 billion clicks per day

Phone: AT&T 20TB phone call database

Consumer: WalMart 70TB database, buying patterns (supermarket checkout)

Page 3: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Sorting: sort(N) = I/Os

Scanning: scan(N) = I/Os

I/O-operation:• movement of one block of data from/to disk

Complexity measure: number of I/Os

Fundamental bounds:

I/O Model [AV’88]

N = problem sizeB = disk block sizeM = memory size

M

Block I/O

)log(BN

BN

BMO

In practice B and M are big

NBN

BN

BN

BM log

)(BNO

Page 4: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Outline

I/O-efficient graph algorithms• Problems, techniques and results

Algorithms for planar graphs using graph separation

A GIS application: TerraFlow

Page 5: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

I/O-Efficient Graph Algorithms

Input: G = (V,E)• Assume edge-list representation of stored on disk

Basic problems: • BFS, DFS, CC, SSSP, MST

• Hard in external memory!

• Lower bound: Ω(minV, sort(V))

(practically Ω(sort(V))

• Standard internal memory algorithms for these problems use O(E) I/Os

Adj(v1) Adj(v2) Adj(v3) …G

Page 6: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

BFS and DFS

DFS(u) Mark u For every v in Adj(u)

• If v not marked DFS(v)

Internal memory: O(V+E)

External memory: one I/O per vertex to load adjacency list Ω (V ) I/Os one I/O per edge to check if v is marked Ω (E) I/Os

O(V+E)= O(E) I/Os

Page 7: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

SSSP and MST

Dijkstra’s algorithm• Maintain p-queue on vertices not yet included in SSSP

• Repeatedly

• DeleteMin(v) and relax each adjacent edge (v,u)

if d(s,u) > d(s,u) + wvu then

DecreaseKey(u, d(s,u) + wvu)

External memory: • one I/O per vertex to load adjacency list Ω (V) I/Os

• External p-queue: O(E) Insert/Delete/DeleteMin in O(sort(E)) I/Os

• DecreaseKey: O(1) I/Os to read key of u Ω (E) I/Os

O(V+E+sort(E))= O(E) I/Os

v

Page 8: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

I/O-Efficient Graph Algorithms

Problems:

1. Random (unstructured) accesses to the adjacency lists of vertices as they are visited Ω(V) I/Os

2. Need to check if v has been already visited and/or read its key Ω(E) I/Os

• o(E) algorithm: solve (2)

• o(V) algorithm: solve (1) and (2)

Page 9: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

o(E) Algorithms

Store edges to previously seen vertices

Undirected/directed BFS, DFS, SSSP

• buffered repository tree (BRT) [BGVW’00]

Insert(v, e), ExtractAll(v)

Process/update all adjacent edges without checking if necessary

Undirected SSSP:

• I/O-efficient tournament tree [KS’96]

DecreaseKey(v,k)

Undirected MST: O(V + sort(E)) [ABT’01]

• Maintain a priority queue on edges incident to current MST

• How to decide if v is in MST without doing one I/O?

– If next edge returned by DeleteMin is the same then v already in MST

v

u

v

Page 10: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

o(V) Algorithms

CC and MST: [MR’99, ABT’01]• graph contraction

• Goal: reduce the problem to the same problem on a smaller graph by selecting disjoint subgraphs and contracting them

• A contraction phase reduces nb of vertices by a constant fraction

• Typically use a sequence of contraction steps

G = G0 G1 G2 … Gi …

• CC and MST algorithms: general idea

• Use contraction steps

• Use an O(V+sort(E)) algorithm on G’

)('B

EOVV )

'(log

V

VO

)loglog)(sort(E

VBEO

u1

u2

u3

u4

u1

u2

u3

u4

Page 11: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

o(V) Algorithms

• Undirected BFS, SSSP [MM’02, MZ’03]• Clustering

• partition graph into V/k subgraphs (clusters) of k vertices

• BFS Idea: Keep a pool of hot clusters

• A cluster is loaded in the pool once

• A cluster stays in the pool until all its vertices have been visited

))sort(( EB

VEO

))sort(( EB

Ek

k

VO

Page 12: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Upper Bounds

General undirected graphs

• CC, MST: [MR’99, ABT’01]

• BFS: [MM’02]

• SSSP: [MZ’03]

• DFS: [KS’96]

General directed graphs

• BFS, DFS, SSSP: [BVWB’00]

Topological sort

))sort(( EB

VEO

)loglog)(sort(E

VBEO

))sort(log( Ew

W

B

VEO

))sort(log)(( EVB

EVO

))sort(log)(( EVB

EVO

Page 13: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Upper BoundsSparse Graphs

Sparse graphs E=O(V)

• CC, MST: O(sort(V)) if graph stays sparse under edge contraction

• Undirected BFS: O(sort(V)) ? open

• Undirected SSSP: O(sort(V)) ? open

• Undirected DFS: O(V) o(V) ? open

Directed BFS, DFS, SSSP

O(sort(N)) BFS, SSSP, (DFS) on special classes of sparse graphs

• Planar

• Outerplanar, grid, bounded-treewidth

)(B

VO

)log(w

W

B

VO

Page 14: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Planar Undirected Graphs

BFS, DFS, SSSP: O(sort(N)) I/Os

• O(sort(N)) I/O-efficient reductions [ABT’00, AMTZ’01]

• Separators can be computed in O(sort(N)) I/Os [MZ’02]

O(sort(N)) I/Os [AMTZ’01]

O(sort(N)) I/Os [ABT’00]O(sort(N)) I/Os [ABT’00]

DFS

BFS SSSPseparators

Page 15: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

I/O-Efficient Graph Algorithms Our Contributions

An MST on general undirected graphs.

O(sort(N)) algorithms on planar graphs• Reducibility on planar undirected graphs• Planar digraphs: SSSP, BFS, directed ear decomposition and topological sort

An O(sort(N) log N) DFS algorithm for planar undirected graphs• O(sort(N)) cycle separator

All-pair-shortest-paths and diameter• Planar digraphs• General undirected graphs

Data structure for shortest path queries on planar digraphs• Trade-off space-query

GIS application: TerraFlow• Flow modeling on grid terrains • r.terraflow: Port into GRASS, the open source GIS

)loglog)(sort(E

VBEO

Page 16: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Outline

I/O-efficient graph algorithms• Problems, techniques and results

Algorithms for planar graphs using graph separation • Shortest paths (SSSP, BFS, APSP)

• DFS

• Topological sort on planar DAGs

• Data structure for SP queries

A GIS application: TerraFlow

Page 17: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Planar graph separation:R-division

A partition of a planar graph using a set S of separator vertices into . subgraphs (clusters) Gi of at most R vertices each such that:

• There are separators vertices in total

• There is no edge between a vertex in Gi and a vertex in Gj

• Each cluster is adjacent to separator vertices

)(R

NO

ji

)( RO

)(R

NO

R

RR

R

R

RR

R

R

Page 18: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

R-division Boundary vertices Bnd(Gi) of Gi

• The separator vertices adjacent to Gi

Boundary set

• Maximal subset of separator vertices that are adjacent to the same clusters Lemma [Frederickson’87]:

• R-division of a planar graph of bounded degree has boundary sets.)(R

NO

Page 19: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

R-divisions and Planar Graph Algorithms

R-divisions [Frederickson’87]

dynamic graph algorithms [GI’91,KS’93], faster SP algorithms [HKRS’97], SP data structures

In external memory choose R = B2

• O(N/B) separator vertices

• O(N/B2) clusters of O(B2) vertices each and O(B) boundary vertices

• O(N/B2) boundary sets

• Can be computed in O(sort(N)) I/Os [MZ’02]

B2-division

SSSP, BFS, DFS, topological sort, APSP, diameter,

SP data structures,..

Page 20: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Planar SSSP

1. Compute a B2-division of G

2. Construct a substitute graph GR on the separator vertices such that it preserves SP in G between any u,v in S• replace each subgraph Gi with a complete graph on Bnd(Gi) • for any u, v on Bnd(Gi), the weight of edge (u,v) is δGi(u,v)

GR has O(N/B2)· O(B2)=O(N) edges and O(N/B) vertices

3. Compute SSSP on GR

4. Compute SSSP to vertices inside clusters

s

t

B2

Page 21: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

SSSP on GR with O(N/B) vertices and O(N) edges

Dijkstra’s algorithm with I/O-efficient p-queue

• Access to adjacency list of each vertex takes O(N/B) I/Os

• O(N) Insert/Delete/DeleteMin in O(sort(N)) I/Os [A95]

• But..need dist(s,u) for all u in Adj(v)

Keep list LS=dist(s,u), for any u in S

• For each vertex v read from LS the current distances of adjacent vertices

O(N) edges => O(N) accesses to LS O(N) I/Os

Planar SSSP

v

Page 22: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

SSSP on GR

Idea: use boundary sets

Store LS so that vertices in the same boundary set are consecutive

• There are O(N/B2) boundary sets

• Vertices in same boundary set have same O(B) neighbors in GR assuming G has bounded degree

• Each boundary set is accessed once by each neighbor in GR • Each boundary set has size O(B)

O(N/B2) x O(B) = O(N/B) I/Os

Planar SSSP

Page 23: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Planar APSP Straightforward bound: O(N sort(N)) = O(sort(N2)) Improved to optimal O(scan(N2)) Idea: compute SP from all vertices in a cluster while cluster is in memory

For each cluster Gi

For any α in Bnd(Gi) compute SSSP(α) in GR For each cluster Gj

load in memory Gj, Bnd(Gj) and δ(Bnd(Gi), Bnd(Gj)) compute the shortest paths between all vertices in Gi and Gj

d(u,v)=minδGj (u,α) + δGR(α, β) + δGi(β,v) | α in Bnd(Gi), β in Bnd(Gj)

write the output O(N/B2) clusters O(sort(N2)/B) [compute] + O(scan(N2)) [output]

Diameter: O(sort(N2)/B)

vu

Gi Gj

α

β

Page 24: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

General AP-BFS

The APSP idea (compute SP from all vertices of a cluster while the cluster is in main memory) can be generalized to other algorithms which use clustering, like the BFS algorithm [MM’02] on general undirected graphs.

Theorem:

• AP-BFS of a general undirected graph and its unweighted diameter can be computed in O(V sort(E)) I/Os.

Note:

• general undirected BFS is O(sort(E)) amortized over V vertices

Page 25: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Planar DFS

s

Idea: Partition the faces of G into levels around a source face containing s and grow DFS level-by-level

• Levels can be obtained from BFS in dual graph• Structure of levels is simple (bicomps are cycles)

• Rooting/Attaching: use that a spanning tree is a DFS-tree if and only if it has no cross edges

A DFS-tree of a planar graph can be computed in O(sort(N)) I/Os

s s

210 H

HH

Page 26: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Planar Graphs

Shortest paths• generalize to digraphs: compute B2-division on the underlying graph

• BFS, SSSP in O(sort(N))

• APSP (transitive closure) in O(scan(N2))

• diameter in O(sort(N2)/B)

DFS• Undirected

• O(sort(N)) using BFS in the dual

[O(sort(N) log N) direct algorithm using cycle separators]

• Directed

• The planar undirected DFS algorithms do not extend to digraphs

• O(sort(N)) DFS? open

Page 27: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Outline

I/O-efficient graph algorithms• Problems, techniques and results

Algorithms for planar graphs using graph separation • Shortest paths (SSSP, BFS, APSP)• DFS• Topological sort on planar DAGs

• O(sort(N)) using directed ear decomposition (DED) of its dual • Simplified algorithm using B2-division

• Data structure for SP queries

A GIS application: TerraFlow

Page 28: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Directed Ear Decomposition (DED)

A directed ear decomposition of a graph G is a partition of G into simple directed paths P0, P1, …, Pk such that:

• P0 is a simple cycle

• endpoints of each Pi i>0 are in lower-indexed paths Pj, Pl, j,l<i

• internal vertices of each Pi i>0 are not in any Pj j<i

G has a directed ear decomposition if and only if it is strongly connected (exist directed cycle containing each pair of vertices u,v).

Planar DED: O(sort(N)) I/Os

Page 29: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Planar Topological Sort using DED

Theorem [KK’79]: The directed dual of a planar DAG is strongly connected and therefore has a directed ear decomposition.

Idea:• Place vertices to the left of P0 before vertices to the right

• Sort two sets recursively

Used in PRAM topological sort algorithm [KK93,K93] PRAM simulation O(sort(N)log N) I/Os Improved to O(sort(N)) by defining and utilizing ordered ear

decomposition tree [ATZ’03]

Page 30: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

O(sort(N)) Topological Sort using B2-division

Same idea as in planar SSSP algorithm

Construct a substitute graph GR using B2-division• edge from v to u on boundary of Gi

if exists path from v to u in Gi

Topologically sort GR (separator vertices in G):• Store in-degree of each vertex in list L• Maintain list of in-degree zero vertices• Repeatedly:

• Number an in-degree zero vertex v• Consider all edges (v,u) and

decrement in-degree of u in L

analysis exactly as in SSSP algorithm O(scan(N)) if B2-division is given

B2

v

Page 31: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

O(sort(N)) Topological Sort using B2-division

Problem:

• Not clear how to incorporate

removed vertices from G in

topological order of separator

vertices (GR)

Solution (assuming only one in-degree zero vertex s for simplicity):

• Longest-path-from-s order is a topological order

• Longest paths to removed vertices

locally computable from longest-paths

to boundary vertices

1 2

34

5

B

F

CDA

E

s

tB2

Page 32: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

O(sort(N)) Topological Sort using B2-division

1. Compute a B2-division of G2. Construct substitute graph GR using

• Weight of edge between v and u on boundary of Gi equal to length of longest path from v to u in Gi

2. Compute longest path to each vertex in GR (same as in G):• Maintain list L of longest paths seen to each vertex• Repeatedly:

• Obtain longest path for nextvertex v in topological order

• Consider all edges (v,u) andupdate longest path to u

3. Find longest path to vertices inside clusters

analysis exactly as for planar SSSP algorithm O(scan(N)) if B2-division is given

v

Page 33: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Outline

I/O-efficient graph algorithms• Problems, techniques and results

Algorithms for planar graphs using graph separation • Shortest paths (SSSP, BFS, APSP)

• DFS

• Topological sort on planar DAGs

• Data structure for SP queries

A GIS application: TerraFlow

Page 34: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Data Structure for SP Queries on Planar Digraphs

Problem: pre-process a planar digraph into a data structure in order to answer efficiently distance (shortest path) queries between arbitrary vertices

Trade-off space-query: O(S) space, query = ? • The two extreme straightforward solutions:

• O(N) space, O(sort(N)) I/O query• O(N2) space, O(1) I/O query

Related work: • Planar graphs: [Arikati et al, Djidjev, 1996] [Chen & Xu, 2000]

• Space-query trade-off: for any S in [N, N2], S x Q = O(N2)• General graphs:

• approx shortest paths [Cohen, Halperin, Zwick, …]

• I/O-model : space, query [HMZ’99])( NNO )(B

NO

Page 35: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Basic data structure [Arikati et al, Djidjev]:

• Recursively, compute a separator and store for each vertex u in G the shortest path from u to all separator vertices.

Space , query time, I/Os [HMZ’99]

Generalized to any S in [N, N2]: O(S) space, Q=O(N2/S)

• Use R-division

• S in [N, N3/2]: Store shortest paths between the separator vertices and compute shortest path in each cluster on the fly.

• S in [N3/2, N2]: Pre-process each cluster as a basic data structure and for any vertex u in G store shortest paths from u to all separator vertices.

I/O-model

• S in [N, N3/2]: ?

• S in [N3/2, N2]: O(S) space, query using [HMZ’99]

Data Structure for SP Queries on Planar Digraphs

)(2

BS

NO

)( NO)( NNO )(B

NO

Page 36: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Data Structure for SP Queries on Planar Digraphs

General framework: Compute an R-division. Store APSP between separator vertices. This uses

space O(N2/R).

Query: δ(u,v)=minδGj (u,α) + δGR(α, β) + δGi(β,v) | α in Bnd(Gi), β in Bnd(Gj)

Problems1. Store APSP between separator vertices so that the O(R) distances

δ(Bnd(Gi), Bnd(Gj)) can be retrieved efficiently in O(scan(R)) I/Os

2. Compute δGj (u,v) in O(scan(R)) I/Os Pre-process each cluster recursively

3. Compute δGj (u, Bnd(Gi)) in O(scan(R)) I/OsPre-process each cluster into a data structure for answering all-boundary-SP queries

vu

Gi Gj

α

β

Page 37: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Data Structure for SP Queries on Planar Digraphs

Let G be a planar graph of size N and Bnd(G) its boundary of size O(N1/2). There exists a data structure that uses space O(N lg N) and answers all-boundary-shortest-path queries in O(N/B) I/Os.

Theorem: For any S in [N, N2/B] there exists a data structure which answers distance

queries in I/Os and can be built in I/Os. The size is . if and if .

• For S = Θ(N): O(N log2N) space and O(N/B) query

• For any S/N = Ω (Nε) or S = Ω (N1 +ε) for some ε in (0,1] There exists a data structure of size O(S) which answers distance

queries in I/Os and can be built in I/Os.

))sort(( NSO)(2

BS

NO

)log( / NSO NS],log[

2

B

NNNS )loglog( /

2

NS

NNO NS ]log,[ NNNS

))sort(( NSO)(2

BS

NO

Page 38: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Outline

I/O-efficient graph algorithms• Problems, techniques and results

Algorithms for planar graphs using graph separation • Shortest paths

• DFS

• Topological sort on planar DAGs

• Data structure for SP queries

A GIS application: TerraFlow

Page 39: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

DEM Representations

3 2 47 5 87 1 9

3 2 47 5 87 1 9

3 2 47 5 87 1 9

3 2 47 5 87 1 9

TIN

GridContour lines

Sample points

TerraFlow

Grids DEMs grid graphsOn grid graphs: BFS, SSSP, CC in O(sort(N)) I/Os

Page 40: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Example: LIDAR Terrain Data

Massive (irregular) point sets (1-10m resolution) Relatively cheap and easy to collect

Example: Jockey’s ridge (NC coast)

TerraFlow

Page 41: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Modeling Flow on Terrains

What happens when it rains?

• Predict areas susceptible to floods.

• Predict location of streams.

Flow is modeled by computing two basic attributes from the DEM of the terrain:

• Flow Direction (FD)

• The direction water flows at a point

• Flow Accumulation (FA)

• Total amount of water that flows through a point if water is distributed according to the flow directions

TerraFlow

Page 42: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Flow Accumulationof Panama

TerraFlow

Page 43: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Panama Flow Accumulation: zoom

TerraFlow

Page 44: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

GIS Performance on Massive Data

GRASS (open source GIS)• Killed after running for 17 days on a 6700 x 4300 grid (approx 50 MB dataset)

TARDEM (research, U. Utah)• Killed after running for 20 days on a 12000 x 10000 grid (appox 240 MB dataset)

• CPU utilization 5%, 3GB swap file

ArcInfo (commercial GIS)• Can handle the 240MB dataset • Doesn’t work for datasets bigger than 2GB

TerraFlow

Page 45: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Flow Direction (FD) on Grids

On grids: Approximated using 3x3 neighborhood

Problem: flat areas - Plateas and sinks

Goal: compute FD grid• Every cell has flow direction• Flow directions do not induce cycles• Every cell has a flow path outside the terrain

TerraFlow

Page 46: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

FD on Flat Areas

Plateaus

• A cell flows towards the nearest spill point on the boundary of the plateau

• Compute FD on plateaus using CC and BFS

Sinks

• Route the water uphill out of the sink by modeling

flooding: uniformly pouring water on terrain

until steady-state is reached

• Flooding removes (fills) sinks

Assign uphill flow directions on the original terrain by assigning downhill flow directions on the flooded terrain

TerraFlow

Page 47: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Flooding

Watershed: part of the terrain that flows into a sink Sinks partition of terrain into watersheds watershed graph GT

• Vertices are watersheds; add vertex for the “outside” watershed

• Edge (u,v) if watersheds u,v are adjacent

• Edge (u,v) labeled with lowest height on boundary between u and v

Flooding: Compute for each watershed u to the height hu of the lowest-height path in GT from u to the “outside” watershed.

• the height of a path is the height of the highest edge on path

TerraFlow

Page 48: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Flooding Plane-sweep algorithm with a Union-Find structure

• Initially only the outside watershed is done• Sweep watershed graph bottom-up with a horizontal plane• When hit edge (u,v)

• If both watersheds u and v are done, ignore• If none is done, union them• If precisely one is not done, raise it at h(u,v) and mark it done

Theorem: Flooding and the FD grid can be computed in O(sort(N)) I/Os on a grid

DEM of size N.

TerraFlow

Page 49: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Flow Accumulation (FA) on Grids

FA models water amount of flow through each cell with “uniform rain”

• Initially one unit of water in each cell

• Water distributed from each cell to neighbors pointed to by its FD

• Flow conservation: If several FD, distribute proportionally to height difference

• Flow accumulation of cell is total flow through it

Goal: compute FA for every cell in the grid (FA grid)

Theorem:

The FA grid can be computed in O(sort(N)) I/Os.

TerraFlow

Page 50: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

TerraFlow

TerraFlow: implementation of I/O-efficient FD and FA algorithms• Significantly faster on very large grids than existing GIS software

• Scalable: 1 billion elements!! (>2GB data)

• Allows multiple methods flow modeling

Implementation • C++, uses TPIE (Transparent Parallel I/O Environment)

• Library of I/O-efficient modules developed at Duke

Experimental platform • TerraFlow, ArcInfo: 500MHz Alpha, FreeBSD 4.0, 1GB RAM

• GRASS/TARDEM: 500MHz Intel PIII, FreeBSD/Windows, 1GB RAM

http://www.cs.duke.edu/geo*/terraflowTerraFlow

Page 51: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

TerraFlow

GRASS cannot handle Hawaii dataset (killed after 17 days) TARDEM cannot handle Cumberlands dataset (killed after 20 days) Significant speedup over ArcInfo (ESRI) for large datasets

• East-Coast

TerraFlow: 8.7 Hours

ArcInfo: 78 Hours

• Washington

TerraFlow: 63 Hours

ArcInfo: %

0

10

20

30

40

50

60

70

80

90

Kawea

h

Puerto

Rico

Sierra

Nev

ada

Hawaii

Cumbe

rland

s

Lower

NE

East-C

oast

Midw

est

Was

hingt

on

Ru

nn

ing

Tim

e (H

ou

rs)

TerraFlow 512

TerraFlow 128

ArcInfo 512

ArcInfo 128

ArcInfo

TerraFlow

http://www.cs.duke.edu/geo*/terraflowTerraFlow

Page 52: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

TerraFlow in GRASS

r.terraflow

• Port of TerraFlow into GRASS

• Available with GRASS 5.0.2

Preliminary results on

• Quality of output

• Comparison with r.watershed

• SFD, MFD comparison

• Performance analysis

Good response from users

http://grass.itc.itTerraFlow

Page 53: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Preliminary Experimental Results

PIII dual 1GHz processor, 1GB RAM

Dataset Grid dimensionsGrid size

(million elements)

Kaweah 1163 x 1424 1.6

Puerto Rico 4452 x 1378 5.9

Sierra Nevada 3750 x 2672 9.5

Hawaii 6784 x 4369 28.2

Lower New England 9148 x 8509 77.8

Panama 11283 x 10862 122.5

r.terraflow

1.85 min

4.65 min

19.22 min

22.35 min

114 min

3.5 hr

r.watershed

9.2 min

93 min

18.2 hours

killed after 6 days

< 1% done

http://grass.itc.itTerraFlow

Page 54: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

I/O-Efficient GISFuture Directions

TerraFlow • Extend flow direction modeling (D-inf)• Realistic treatment of flat areas• Partial flooding• Computing complete watershed hierarchy

Processing LIDAR data• Point to grid conversion, point to TIN conversion, terrain

simplification, Delaunay triangulation…

TINs • Practical algorithms on triangulations• Flow modeling on TINs

• Geometric? Graph theoretical?

Page 55: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

I/O-Efficient Graph AlgorithmsOpen Problems

Improved algorithms for general digraphs

O(sort(N)) DFS on planar digraphs

• Planar DAGs: can a DFS-tree be computed using topological order?

O(sort(E)) algorithm for CC/MST

Improved DFS on general undirected graphs (clustering?)

Simple and feasible O(sort(N)) algorithms for planar graphs and in particular for triangulations

Dynamic data structures for planar graphs

Page 56: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

The End

Page 57: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Upper Bounds

Dense graphs CC,MST:

: BFS:

SSSP:

DFS:

Sparse graphs E=O(V) CC, MST:

O(sort(V)) if graph closed under edge contraction

BFS:

SSSP:

DFS: O(V)

O(sort(V)) BFS, DFS, SSSP on planar graphs, outerplanar graphs, grid graphs, bounded-tree-width graphs

)(B

VO

)log(w

W

B

VO

BV

BVEEO

BM2log

if))(sort(

BVEEO if))(sort(

BV

wW

BVEEO

BM2log

logif))(sort(

VBEVB

EO if)log(

General undirected graphs

Page 58: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

MST Contraction Step

Used in PRAM MST algorithms [CLC’82] Each vertex selects its lightest adjacent edge Lemma: Each selected edge must be part of MST The selected edges are contracted:

• Number of resulting vertices at most V/2

• Note: contraction does not reduce the number of edges

MST contraction step in O(sort(E)) I/Os• Finding the representative of a super-vertex [ABT’01]

Page 59: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

I/O-Efficient MST

• Graph contraction algorithm can be improved by grouping the contraction steps in super-steps

• Each super-step in O(sort(E) + sort(V)) I/Os• Basic idea: in order to perform k contraction steps need to

know only the 2k lightest edges adjacent to each node

each super-step works with a subset of the edges

nb contraction steps x subset of edges = O(V)

stepssuper'

loglog

1

stepsuper

steps

32

'log..................1

1

V

V

i

i

cc

i

c

ii

V

Vccccc

)loglog)(sort(E

VBEO

Page 60: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

A direct O(sort(N) log N) DFS Algorithm on Planar Undirected Graphs

Divide-and-conquer using cycle separators [PRAM DFS, Smith86] Algorithm

• Compute a cycle separator C and path P• Compute DFS recursively in the connected components Gi of G\P• Attach the DFS trees of Gi onto the cycle

I/O-analysis• O(log N) recursive steps• O(sort(N)) I/Os per step

• simple O(sort(N)) algorithm for finding a cycle separator

O(sort(N) log N) I/Os in total

Page 61: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Planar DFS Denote

• Gi = union of the boundaries of faces at level <= i

• Hi = Gi \ G i-1

• Ti = DFS-tree of Gi

Structure of levels is simple

• The bicomps of the Hi are the boundary cycles of Gi

s

HH

H01 2 H

HH0

1 2

Page 62: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Planar DFS

Algorithm: Compute DFS of Hi and attach it onto Ti-1

Attaching onto Ti-1 :

s

210 H

HH

11

10

97

6

54

3

2

1

27

25

24

2322

21

20

19

26

18

17

1615

8

14 13

12

s

Page 63: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Planar DAGsSummary and Open Problems

If the B2-division is given

• Topological sort can be computed in O(scan(N)) I/Os

• Extends to BFS and SSSP Simplified O(scan(N)) algorithms for planar DAGs

B2-division

??

scan(N)

scan(N)SSSP BFS

Topological sort

DFS

Page 64: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Massive Terrain Data

Remote sensing technology

• Massive amounts of terrain data

• Higher resolutions (1km, 100m, 30m, 10m, 1m,…)

NASA-SRTM • Mission launched in 2001• Acquired data for 80% of

earth at 30m resolution • 5TB

USGS • Most of US at 10m

resolution LIDAR

• 1m res

TerraFlow

Page 65: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Uses

Flow direction and flow accumulation are used for:

Computing other hydrological attributes • river network• moisture indices• watersheds and watershed divides

Analysis and prediction of sediment and pollutant movement in landscapes.

Decision support in land management, flood and pollution prevention and disaster management

TerraFlow

Page 66: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Algorithm:

• Input: flow direction grid FD

• Output: flow accumulation grid FA (initialized to 1)

• Process (sweep) cells in topological order. For each cell:

• Read flow from FA grid and direction from FD grid

• Update flow in FA grid for downslope neighbors Analysis

• One sweep enough: O(sort) + O(N) time for a grid of N cells,

• ..but O(N) I/Os: Cells in topological order distributed over the terrain

Standard FA Algorithm

TerraFlow

Page 67: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

I/O-Efficient FA Algorithm

Eliminating scattered accesses to FD grid

• Store FD grid in topological order Eliminating scattered accesses to FA grid ..

….by replacing them with accesses to a p-queue

• Idea: Flow to neighbor is only needed when neighbor is processed

• time when cell is processed topological rank priority

• Push flow by Insert-ing a flow increment in p-queue with priority equal to neighbor’s time

• Flow of cell obtained using DeleteMin

• Note: Augment each cell with priorities of 8 neighbors

• Obs: Space (~9N) traded for I/O

The FA grid can be computed in O(sort(N)) I/Os.

TerraFlow

Page 68: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

GRASS:>r.terraflow helpDescription:

Flow computation for massive grids.Usage:

r.terraflow [-sq] elev=name filled=name direction=name watershed=name accumulation=name tci=name [d8cut=value] [memory=value] [STREAM_DIR=name] [stats=name]

Flags:-s SFD (D8) flow (default is MFD)

-q Quiet

Parameters: elev Input elevation grid filled Output (filled) elevation grid direction Output direction grid watershed Output watershed grid accumulation Output accumulation grid tci Output tci grid d8cut If flow accumulation is larger than this value it is routed using SFD (D8) direction

(meaningfull only for MFD flow only). default: infinity

memory Main memory size (in MB) default: 300

STREAM_DIR Location of intermediate STREAMs default: /var/tmp

stats Stats file default: stats.outv

http://www.cs.duke.edu/geo*/terraflow

Page 69: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

Flat DEM

Page 70: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

r.terraflow MFD

Page 71: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

r.terraflow SFD

Page 72: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

r.watershed

Page 73: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

r.terraflow MFD zoom,2D

Page 74: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

r.terraflow SFD zoom,2D

Page 75: External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

It’s Growing!

Appalachian Mountains

Area if approx. 800 km x 800 km

Sampled at:

• 100m resolution: 64 million points (128MB)

• 30m resolution: 640 (1.2GB)

• 10m resolution: 6400 = 6.4 billion (12GB)

• 1m resolution: 600.4 billion (1.2TB)