i/o and space-efficient path traversal in planar graphs craig dillabaugh, carleton university meng...

I/O and Space-Efficient Path Traversal in Planar Graphs

Craig Dillabaugh, Carleton University

Meng He, University of Waterloo

Anil Maheshwari, Carleton University

Norbert Zeh, Dalhousie University

Background: Succinct Data Structures What are succinct data structures

(Jacobson 1989) Representing data structures using ideally

information-theoretic minimum space Supporting efficient navigational operations

Why succinct data structures Large data sets in modern applications:

textual, genomic, spatial or geometric

Background: External Memory Model

Parameters N: number of elements in the problem instance M: size of the internal memory B: size of a disk block

Cost: number of I/O’s (block transfers) between internal memory and external memory

Aggarwal and Vitter 1988

CPU

Internal Memory

Block

External Memory

Our Contributions Our goal is to design data structures that are

both succinct and efficient in the External Memory setting

Our results A succinct representation of bounded-degree planar

graphs that supports I/O-efficient path traversal A succinct representation of triangulated terrains that

supports various geometric queries

Notation

N: number of vertices of the given graph G d: maximum degree of vertices q: number of bits required to encode the key of each

vertex K: the length of the path

3

5 31

99

4

12

22

18

4

Two-Level Partition A tool: graph separator (Frederickson 1987)

Size of each subgraph (region): r Number of regions: Θ(N/r) Number of boundary vertices: O(N/(r1/2))

Two-level partition Subdivide G into regions of fixed maximum size Subdivide each region into sub-regions of smaller

fixed maximum size Types of vertices for each region / subregion

Interior vertices Boundary vertices

α-Neighbourhood Definition

Beginning with a given vertex v, we perform a breadth-first search in G and select the first α vertices encountered

The α-neighbourhood of v is the subgraph of G induced by these vertices

Internal and terminal vertices Property: The distance between v and any terminal vertex in its α-

neighbourhood is at least logd α

In our representation, we store α-neighbourhood of each boundary vertex. If a sub-region boundary vertex is interior to a region, we add an additional constraint that its α-neighbourhood cannot be extended beyond the region

Overview of Labeling Scheme Labels at three levels for the same vertex

Graph-label (unique) Region-label (one or more) Subregion-label (one or more)

Assign the labels for bottom up

Sub-Region Labels Encoding subregion Ri,j using any succinct

representation for planar graphs

This induces a permutation of the vertices in Ri,j

Subregion-label: the kth vertex in the above permutation has subregion-label k in Ri,j

Region-Labels and Graph-Labels

1, 2, 3, 4, 5, 6 1, 2, 3, 4, 5 1, 2, 3, 4, 5, 6, 7

R1,1R1,2

R1,3

R1

1, 2, 3, 4, 5, 6 7, 8, 9, 10, 11, 12,13,14,15 …

The assignment of graph-labels are similar

Succinct structures of o(n) bits are constructed to support conversion between labels at different levels in O(1) I/O’s

Data Structures Denote by A the maximum number of vertices that may

be stored in a block, and this is our maximum sub-region size

Choose Alg3N to be the maximum size of each region We only encode sub-regions and α-neighbourhoods of

boundary vertices as components Encode the graph structure of each component in a

succinct fashion Information is encoded so that we can retrieve the graph

labels of the internal vertices in an α-neighbourhood without requiring additional I/O’s

Space Analysis We assume B = Ω(lg N) A = (B lg N) / (c + q)

c: number of bits per vertex required to the sub-graph structure and boundary bit vector

Choose α = A1/3

Intuitively, our structures are space-efficient because: Region boundary vertices are few enough, so that information

such as the graph labels of the vertices in their α-neighbourhoods do not occupy too much space

The number of sub-region boundary vertices is larger, but information such as region-labels uses fewer bits (lg (Alg3N))

Total space: O(N) + Nq + o(Nq) bits

Traversal Algorithm Load either a sub-region or the α-neighbourhood

of a boundary vertex

Traverse the above component until a boundary/terminal vertex is encountered

Load the next component from external memory and traversal continues

I/O Efficiency Observations

When encountering a terminal/boundary vertex, the next component can be loaded in O(1) I/O’s

Given a component, the graph labels of all interior/internal vertices can be reported without incurring any additional I/O’s

By loading a constant number of components, we can visit Ω(lg B) vertices along the path

I/O complexity: O(K / lg B)

Main Result A succinct representation of bounded-

degree planar graph:

Space: O(N) + Nq + o(Nq) bits

I/O complexity for path traversal: O(K / lg B)

Terrains Modeled as Triangular-Irregular Network Notation

N: number of points Φ: number of bits required to store the coordinates of

each point Space:

NΦ + O(N) + o(NΦ) bits I/O complexity:

Reporting a path crossing K faces: O(K / lg B)

Queries on Triangulated Terrains Point location: O(log B N) I/O’s Terrain profile: O(K / lg B) I/O’s Trickle path: O(K / lg B) I/O’s Connected component

O(K / lg B) I/O’s if the component is convex Can be generalized to components that are not

convex, though the result is more complex

Conclusions We designed a succinct representation of

bounded-degree planar graphs that supports I/O-efficient path traversal, and applied this to terrains modeled as TIN to support queries

This provides solutions to modern applications that process very large data

Future work: combining succinct data structures and external memory data structures for other problems

Thank you!

i/o and space-efficient path traversal in planar graphs craig dillabaugh, carleton university meng...

Documents